SwordElucidator finalf0 commited on
Commit
e618348
·
verified ·
0 Parent(s):

Duplicate from openbmb/MiniCPM-Llama3-V-2_5

Browse files

Co-authored-by: Hongji Zhu <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
37
+ *.gif filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: visual-question-answering
3
+ language:
4
+ - en
5
+ - zh
6
+ datasets:
7
+ - openbmb/RLAIF-V-Dataset
8
+ ---
9
+
10
+
11
+ <h1>A GPT-4V Level Multimodal LLM on Your Phone</h1>
12
+
13
+ [GitHub](https://github.com/OpenBMB/MiniCPM-V) | [Demo](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) | <a href="https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/wechat.md" target="_blank"> WeChat</a>
14
+
15
+
16
+ ## News <!-- omit in toc -->
17
+
18
+ #### 📌 Pinned
19
+
20
+ * [2024.05.28] 🚀🚀🚀 MiniCPM-Llama3-V 2.5 now fully supports its feature in llama.cpp and ollama! Please pull the latest code **of our provided forks** ([llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md), [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5)). GGUF models in various sizes are available [here](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main). MiniCPM-Llama3-V 2.5 series is **not supported by the official repositories yet**, and we are working hard to merge PRs. Please stay tuned! You can visit our [GitHub](https://github.com/OpenBMB/MiniCPM-V) repository for more information!
21
+ * [2024.05.28] 💫 We now support LoRA fine-tuning for MiniCPM-Llama3-V 2.5, using only 2 V100 GPUs! See more statistics [here](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#model-fine-tuning-memory-usage-statistics).
22
+ * [2024.05.23] 🔍 We've released a comprehensive comparison between Phi-3-vision-128k-instruct and MiniCPM-Llama3-V 2.5, including benchmarks evaluations, multilingual capabilities, and inference efficiency 🌟📊🌍🚀. Click [here](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/compare_with_phi-3_vision.md) to view more details.
23
+ * [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and HuggingFace Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available [here](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5). Come and try it out!
24
+
25
+ <br>
26
+
27
+ * [2024.06.03] Now, you can run MiniCPM-Llama3-V 2.5 on multiple low VRAM GPUs(12 GB or 16 GB) by distributing the model's layers across multiple GPUs. For more details, Check this [link](https://github.com/OpenBMB/MiniCPM-V/blob/main/docs/inference_on_multiple_gpus.md).
28
+ * [2024.05.25] MiniCPM-Llama3-V 2.5 now supports streaming outputs and customized system prompts. Try it at [here](#usage)
29
+ * [2024.05.24] We release the [MiniCPM-Llama3-V 2.5 gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf), which supports [llama.cpp](https://github.com/OpenBMB/MiniCPM-V/tree/main?tab=readme-ov-file#inference-with-llamacpp) inference and provides a 6~8 token/s smooth decoding on mobile phones. Try it now!
30
+ * [2024.05.20] We open-soure MiniCPM-Llama3-V 2.5, it has improved OCR capability and supports 30+ languages, representing the first end-side MLLM achieving GPT-4V level performance! We provide [efficient inference](#deployment-on-mobile-phone) and [simple fine-tuning](https://github.com/OpenBMB/MiniCPM-V/blob/main/finetune/readme.md). Try it now!
31
+
32
+
33
+ ## Model Summary
34
+
35
+ **MiniCPM-Llama3-V 2.5** is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0. Notable features of MiniCPM-Llama3-V 2.5 include:
36
+
37
+ - 🔥 **Leading Performance.**
38
+ MiniCPM-Llama3-V 2.5 has achieved an average score of 65.1 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks. **With only 8B parameters, it surpasses widely used proprietary models like GPT-4V-1106, Gemini Pro, Claude 3 and Qwen-VL-Max** and greatly outperforms other Llama 3-based MLLMs.
39
+
40
+ - 💪 **Strong OCR Capabilities.**
41
+ MiniCPM-Llama3-V 2.5 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344), achieving an **700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini Pro**. Based on recent user feedback, MiniCPM-Llama3-V 2.5 has now enhanced full-text OCR extraction, table-to-markdown conversion, and other high-utility capabilities, and has further strengthened its instruction-following and complex reasoning abilities, enhancing multimodal interaction experiences.
42
+
43
+ - 🏆 **Trustworthy Behavior.**
44
+ Leveraging the latest [RLAIF-V](https://github.com/RLHF-V/RLAIF-V/) method (the newest technology in the [RLHF-V](https://github.com/RLHF-V) [CVPR'24] series), MiniCPM-Llama3-V 2.5 exhibits more trustworthy behavior. It achieves **10.3%** hallucination rate on Object HalBench, lower than GPT-4V-1106 (13.6%), achieving the best-level performance within the open-source community. [Data released](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset).
45
+
46
+ - 🌏 **Multilingual Support.**
47
+ Thanks to the strong multilingual capabilities of Llama 3 and the cross-lingual generalization technique from [VisCPM](https://github.com/OpenBMB/VisCPM), MiniCPM-Llama3-V 2.5 extends its bilingual (Chinese-English) multimodal capabilities to **over 30 languages including German, French, Spanish, Italian, Korean, Japanese etc.** [All Supported Languages](./assets/minicpm-llama-v-2-5_languages.md).
48
+
49
+ - 🚀 **Efficient Deployment.**
50
+ MiniCPM-Llama3-V 2.5 systematically employs **model quantization, CPU optimizations, NPU optimizations and compilation optimizations**, achieving high-efficiency deployment on edge devices. For mobile phones with Qualcomm chips, we have integrated the NPU acceleration framework QNN into llama.cpp for the first time. After systematic optimization, MiniCPM-Llama3-V 2.5 has realized a **150-fold acceleration in multimodal large model end-side image encoding** and a **3-fold increase in language decoding speed**.
51
+
52
+ - 💫 **Easy Usage.**
53
+ MiniCPM-Llama3-V 2.5 can be easily used in various ways: (1) [llama.cpp](https://github.com/OpenBMB/llama.cpp/blob/minicpm-v2.5/examples/minicpmv/README.md) and [ollama](https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5) support for efficient CPU inference on local devices, (2) [GGUF](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) format quantized models in 16 sizes, (3) efficient [LoRA](https://github.com/OpenBMB/MiniCPM-V/tree/main/finetune#lora-finetuning) fine-tuning with only 2 V100 GPUs, (4) [streaming output](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5#usage), (5) quick local WebUI demo setup with [Gradio](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_2.5.py) and [Streamlit](https://github.com/OpenBMB/MiniCPM-V/blob/main/web_demo_streamlit-2_5.py), and (6) interactive demos on [HuggingFace Spaces](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
54
+
55
+ ### Evaluation <!-- omit in toc -->
56
+
57
+ Results on TextVQA, DocVQA, OCRBench, OpenCompass MultiModal Avg , MME, MMBench, MMMU, MathVista, LLaVA Bench, RealWorld QA, Object HalBench.
58
+
59
+ <div align="center">
60
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64abc4aa6cadc7aca585dddf/v2KE3wqQgM05ZW3dH2wbx.png" width="110%" />
61
+ </div>
62
+
63
+
64
+ Evaluation results of multilingual LLaVA Bench
65
+ <div align="center">
66
+ <img src="assets/minicpmv-llama3-v2.5/llavabench_compare.png" width="110%" />
67
+ </div>
68
+
69
+
70
+ ### Examples <!-- omit in toc -->
71
+
72
+ <table align="center">
73
+ <p align="center">
74
+ <img src="assets/minicpmv-llama3-v2.5/cases_all.png" width=95%/>
75
+ </p>
76
+ </table>
77
+
78
+ We deploy MiniCPM-Llama3-V 2.5 on end devices. The demo video is the raw screen recording on a Xiaomi 14 Pro without edition.
79
+
80
+ <table align="center">
81
+ <p align="center">
82
+ <img src="assets/gif_cases/ticket.gif" width=40% style="display:inline-block;"/>
83
+ <img src="assets/gif_cases/meal_plan.gif" width=40% style="display:inline-block;"/>
84
+ </p>
85
+ </table>
86
+
87
+ <table align="center">
88
+ <p align="center">
89
+ <img src="assets/gif_cases/1-4.gif" width=80%/>
90
+ </p>
91
+ </table>
92
+
93
+
94
+
95
+ ## Demo
96
+ Click here to try out the Demo of [MiniCPM-Llama3-V 2.5](https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5).
97
+
98
+ ## Deployment on Mobile Phone
99
+ Coming soon.
100
+
101
+ ## Usage
102
+ Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10:
103
+ ```
104
+ Pillow==10.1.0
105
+ torch==2.1.2
106
+ torchvision==0.16.2
107
+ transformers==4.40.0
108
+ sentencepiece==0.1.99
109
+ ```
110
+
111
+ ```python
112
+ # test.py
113
+ import torch
114
+ from PIL import Image
115
+ from transformers import AutoModel, AutoTokenizer
116
+
117
+ model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16)
118
+ model = model.to(device='cuda')
119
+
120
+ tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
121
+ model.eval()
122
+
123
+ image = Image.open('xx.jpg').convert('RGB')
124
+ question = 'What is in the image?'
125
+ msgs = [{'role': 'user', 'content': question}]
126
+
127
+ res = model.chat(
128
+ image=image,
129
+ msgs=msgs,
130
+ tokenizer=tokenizer,
131
+ sampling=True, # if sampling=False, beam_search will be used by default
132
+ temperature=0.7,
133
+ # system_prompt='' # pass system_prompt if needed
134
+ )
135
+ print(res)
136
+
137
+ ## if you want to use streaming, please make sure sampling=True and stream=True
138
+ ## the model.chat will return a generator
139
+ res = model.chat(
140
+ image=image,
141
+ msgs=msgs,
142
+ tokenizer=tokenizer,
143
+ sampling=True,
144
+ temperature=0.7,
145
+ stream=True
146
+ )
147
+
148
+ generated_text = ""
149
+ for new_text in res:
150
+ generated_text += new_text
151
+ print(new_text, flush=True, end='')
152
+ ```
153
+
154
+ Please look at [GitHub](https://github.com/OpenBMB/MiniCPM-V) for more detail about usage.
155
+
156
+
157
+ ## Inference with llama.cpp<a id="llamacpp"></a>
158
+ MiniCPM-Llama3-V 2.5 can run with llama.cpp now! See our fork of [llama.cpp](https://github.com/OpenBMB/llama.cpp/tree/minicpm-v2.5/examples/minicpmv) for more detail.
159
+
160
+
161
+ ## Int4 quantized version
162
+ Download the int4 quantized version for lower GPU memory (8GB) usage: [MiniCPM-Llama3-V-2_5-int4](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4).
163
+
164
+ ## MiniCPM-V 2.0 <!-- omit in toc -->
165
+ Please see the info about MiniCPM-V 2.0 [here](https://huggingface.co/openbmb/MiniCPM-V-2).
166
+
167
+ ## License
168
+ #### Model License
169
+ * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
170
+ * The usage of MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
171
+ * The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
172
+
173
+
174
+
175
+ #### Statement
176
+ * As an LLM, MiniCPM-Llama3-V 2.5 generates contents by learning a large mount of texts, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-Llama3-V 2.5 does not represent the views and positions of the model developers
177
+ * We will not be liable for any problems arising from the use of the MinCPM-V open Source model, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
178
+
179
+ ## Other Multimodal Projects from Our Team
180
+
181
+ [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
182
+
183
+ ## Citation
184
+
185
+ If you find our work helpful, please consider citing the following papers
186
+
187
+ ```bib
188
+ @article{yu2023rlhf,
189
+ title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
190
+ author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
191
+ journal={arXiv preprint arXiv:2312.00849},
192
+ year={2023}
193
+ }
194
+ @article{viscpm,
195
+ title={Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages},
196
+ author={Jinyi Hu and Yuan Yao and Chongyi Wang and Shan Wang and Yinxu Pan and Qianyu Chen and Tianyu Yu and Hanghao Wu and Yue Zhao and Haoye Zhang and Xu Han and Yankai Lin and Jiao Xue and Dahai Li and Zhiyuan Liu and Maosong Sun},
197
+ journal={arXiv preprint arXiv:2308.12038},
198
+ year={2023}
199
+ }
200
+ @article{xu2024llava-uhd,
201
+ title={{LLaVA-UHD}: an LMM Perceiving Any Aspect Ratio and High-Resolution Images},
202
+ author={Xu, Ruyi and Yao, Yuan and Guo, Zonghao and Cui, Junbo and Ni, Zanlin and Ge, Chunjiang and Chua, Tat-Seng and Liu, Zhiyuan and Huang, Gao},
203
+ journal={arXiv preprint arXiv:2403.11703},
204
+ year={2024}
205
+ }
206
+ @article{yu2024rlaifv,
207
+ title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness},
208
+ author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
209
+ journal={arXiv preprint arXiv:2405.17220},
210
+ year={2024},
211
+ }
212
+ ```
assets/MiniCPM-Llama3-V-2.5-benchmark.png ADDED

Git LFS Details

  • SHA256: 3e41abe1bf86aa1090731ea03f74aef033c20b6f7184f26326c6f80c2a1475fa
  • Pointer size: 131 Bytes
  • Size of remote file: 321 kB
assets/MiniCPM-Llama3-V-2.5-peformance.png ADDED

Git LFS Details

  • SHA256: 2173b528a73ae14044ffa93e2f45ee9fe3be4361be43aee80c89ecb2e0bf83fa
  • Pointer size: 131 Bytes
  • Size of remote file: 160 kB
assets/gif_cases/1-4.gif ADDED

Git LFS Details

  • SHA256: a862ab0e495bedb4326d56989fb886bca671589810633657723121e7d71b8a6a
  • Pointer size: 133 Bytes
  • Size of remote file: 11.4 MB
assets/gif_cases/meal_plan.gif ADDED

Git LFS Details

  • SHA256: be4758e7a0502e2275f0492b9db41b593a23c7b907ccaaed39565528af6e55ff
  • Pointer size: 132 Bytes
  • Size of remote file: 6.23 MB
assets/gif_cases/ticket.gif ADDED

Git LFS Details

  • SHA256: 6106314877300127b571e2a3f9576300c397de1816c9e3ca356baf2a1e76aaa9
  • Pointer size: 133 Bytes
  • Size of remote file: 19.7 MB
assets/minicpmv-llama3-v2.5/case_OCR_en.png ADDED

Git LFS Details

  • SHA256: 48895d41723873d46eb3c6ab966afc7dc41dd3cf9083c56e220eb38f19c85f92
  • Pointer size: 132 Bytes
  • Size of remote file: 5.87 MB
assets/minicpmv-llama3-v2.5/case_complex_reasoning.png ADDED

Git LFS Details

  • SHA256: 59172f5487c203d0a112cde1c14f84a9db9776f37703b8c1db7b1df494cba85d
  • Pointer size: 132 Bytes
  • Size of remote file: 1.73 MB
assets/minicpmv-llama3-v2.5/case_long_img.png ADDED

Git LFS Details

  • SHA256: b9294bcc3f002c74046c7626a34d766959a19548d08b7408259e6fb9b126d51d
  • Pointer size: 132 Bytes
  • Size of remote file: 3.4 MB
assets/minicpmv-llama3-v2.5/case_markdown.png ADDED

Git LFS Details

  • SHA256: 745359d2a7779b3997311e08b4b90785c75af0a33d039273935661858c06614a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.8 MB
assets/minicpmv-llama3-v2.5/cases_all.png ADDED

Git LFS Details

  • SHA256: 2f8f26e235dec760f4b0d7184462d17bfd095ffbe0b1dfee5c659a5aa8f9a4d7
  • Pointer size: 133 Bytes
  • Size of remote file: 13.2 MB
assets/minicpmv-llama3-v2.5/llavabench_compare.png ADDED

Git LFS Details

  • SHA256: 4b9f2e5a86c152974c39d2fa3b8258ebb2a44c73ecf9e7f8ca3cd0c8c120e190
  • Pointer size: 131 Bytes
  • Size of remote file: 504 kB
config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openbmb/MiniCPM-Llama3-V-2_5",
3
+ "architectures": [
4
+ "MiniCPMV"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "configuration_minicpm.MiniCPMVConfig",
10
+ "AutoModel": "modeling_minicpmv.MiniCPMV",
11
+ "AutoModelForCausalLM": "modeling_minicpmv.MiniCPMV"
12
+ },
13
+ "batch_vision_input": true,
14
+ "bos_token_id": 128000,
15
+ "drop_vision_last_layer": false,
16
+ "eos_token_id": 128001,
17
+ "hidden_act": "silu",
18
+ "hidden_size": 4096,
19
+ "image_size": 448,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 14336,
22
+ "max_position_embeddings": 8192,
23
+ "mm_use_im_start_end": true,
24
+ "model_type": "minicpmv",
25
+ "num_attention_heads": 32,
26
+ "num_hidden_layers": 32,
27
+ "num_key_value_heads": 8,
28
+ "patch_size": 14,
29
+ "pretraining_tp": 1,
30
+ "query_num": 96,
31
+ "rms_norm_eps": 1e-05,
32
+ "rope_scaling": null,
33
+ "rope_theta": 500000.0,
34
+ "slice_config": {
35
+ "max_slice_nums": 9,
36
+ "patch_size": 14,
37
+ "model_type": "minicpmv"
38
+ },
39
+ "slice_mode": true,
40
+ "tie_word_embeddings": false,
41
+ "torch_dtype": "float16",
42
+ "transformers_version": "4.40.0",
43
+ "use_cache": false,
44
+ "vision_config": {
45
+ "hidden_size": 1152,
46
+ "image_size": 980,
47
+ "intermediate_size": 4304,
48
+ "model_type": "idefics2",
49
+ "num_attention_heads": 16,
50
+ "num_hidden_layers": 27,
51
+ "patch_size": 14
52
+ },
53
+ "vocab_size": 128256
54
+ }
configuration_minicpm.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """ MiniCPM model configuration"""
21
+ import os
22
+ from typing import Union
23
+
24
+ from transformers.utils import logging
25
+ from transformers import LlamaConfig, PretrainedConfig
26
+ from transformers.models.idefics2.modeling_idefics2 import Idefics2VisionConfig
27
+
28
+ logger = logging.get_logger(__name__)
29
+
30
+
31
+ class MiniCPMVSliceConfig(PretrainedConfig):
32
+ model_type = "minicpmv"
33
+
34
+ def __init__(
35
+ self,
36
+ patch_size=14,
37
+ max_slice_nums=9,
38
+ scale_resolution=448,
39
+ **kwargs,
40
+ ):
41
+ super().__init__(**kwargs)
42
+ self.patch_size = patch_size
43
+ self.max_slice_nums = max_slice_nums
44
+ self.scale_resolution = scale_resolution
45
+
46
+ @classmethod
47
+ def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
48
+ cls._set_token_in_kwargs(kwargs)
49
+
50
+ config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
51
+
52
+ if config_dict.get("model_type") == "minicpmv":
53
+ config_dict = config_dict["slice_config"]
54
+
55
+ if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
56
+ logger.warning(
57
+ f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
58
+ f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
59
+ )
60
+
61
+ return cls.from_dict(config_dict, **kwargs)
62
+
63
+
64
+
65
+ class MiniCPMVConfig(LlamaConfig):
66
+ model_type = "minicpmv"
67
+ keys_to_ignore_at_inference = ["past_key_values"]
68
+
69
+ default_vision_config = {
70
+ "hidden_size": 1152,
71
+ "image_size": 980,
72
+ "intermediate_size": 4304,
73
+ "model_type": "idefics2",
74
+ "num_attention_heads": 16,
75
+ "num_hidden_layers": 27,
76
+ "patch_size": 14,
77
+ }
78
+
79
+ def __init__(
80
+ self,
81
+ use_cache=True,
82
+ query_num=64,
83
+ image_size=448,
84
+ drop_vision_last_layer=True,
85
+ batch_vision_input=True,
86
+ slice_config=None,
87
+ vision_config=None,
88
+ **kwargs,
89
+ ):
90
+ self.use_cache = use_cache
91
+ self.query_num = query_num
92
+ self.image_size = image_size
93
+ self.drop_vision_last_layer = drop_vision_last_layer
94
+ self.batch_vision_input = batch_vision_input
95
+
96
+ if slice_config is None:
97
+ self.slice_config = MiniCPMVSliceConfig(max_slice_nums=1)
98
+ else:
99
+ self.slice_config = MiniCPMVSliceConfig(**slice_config)
100
+ self.slice_mode = True
101
+
102
+ # same as HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit
103
+ if vision_config is None:
104
+ self.vision_config = Idefics2VisionConfig(**self.default_vision_config)
105
+ logger.info("vision_config is None, using default vision config")
106
+ elif isinstance(vision_config, dict):
107
+ self.vision_config = Idefics2VisionConfig(**vision_config)
108
+ elif isinstance(vision_config, Idefics2VisionConfig):
109
+ self.vision_config = vision_config
110
+
111
+ self.patch_size = self.vision_config.patch_size
112
+
113
+ super().__init__(**kwargs)
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 128000,
4
+ "eos_token_id": 128001,
5
+ "transformers_version": "4.40.0"
6
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45cb989d608666713ecd5a1fbd9d9728d560b6f67846aefa29fb980d5b723a80
3
+ size 2443235064
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06ea145d0841c392c2d2a261e83c9523ac7f2f30d2292f7ddbb00f7a28e5dc47
3
+ size 2416006592
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13aa3033a8c96e1a09326c68f755655d2c98383fef0277c57e851776cb41ace7
3
+ size 2499909808
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a41e014eeaf91645e468279cb092ed3f189b3a8679b4950bd9fcd34ac45a350
3
+ size 2499909832
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec7c6f6f1c80cd8f7d8fde4cc50726e077801cd360ae48ca5506146b6b19a6e5
3
+ size 2416006640
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11b58e0535e426d5c35239e67caedcf3178254031435d28b86e6e2f2dd89229e
3
+ size 2499909816
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:773eae47408e0a463826016c6cd58441dbd9f06c9e5f1998aa77ba60729a7dce
3
+ size 2299293224
model.safetensors.index.json ADDED
@@ -0,0 +1,748 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 34148369344
4
+ },
5
+ "weight_map": {
6
+ "llm.lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "llm.model.embed_tokens.weight": "model-00001-of-00007.safetensors",
8
+ "llm.model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
9
+ "llm.model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
10
+ "llm.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
11
+ "llm.model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
12
+ "llm.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "llm.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
14
+ "llm.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
15
+ "llm.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
16
+ "llm.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
17
+ "llm.model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
18
+ "llm.model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
19
+ "llm.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
20
+ "llm.model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
21
+ "llm.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
22
+ "llm.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
23
+ "llm.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
24
+ "llm.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
25
+ "llm.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
26
+ "llm.model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
27
+ "llm.model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
28
+ "llm.model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
29
+ "llm.model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
30
+ "llm.model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
31
+ "llm.model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
32
+ "llm.model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
33
+ "llm.model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
34
+ "llm.model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
35
+ "llm.model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
36
+ "llm.model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
37
+ "llm.model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
38
+ "llm.model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
39
+ "llm.model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
40
+ "llm.model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
41
+ "llm.model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
42
+ "llm.model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
43
+ "llm.model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
44
+ "llm.model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
45
+ "llm.model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
46
+ "llm.model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
47
+ "llm.model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
48
+ "llm.model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
49
+ "llm.model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
50
+ "llm.model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
51
+ "llm.model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
52
+ "llm.model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
53
+ "llm.model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
54
+ "llm.model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
55
+ "llm.model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
56
+ "llm.model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
57
+ "llm.model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
58
+ "llm.model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
59
+ "llm.model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
60
+ "llm.model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
61
+ "llm.model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
62
+ "llm.model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
63
+ "llm.model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
64
+ "llm.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
65
+ "llm.model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
66
+ "llm.model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
67
+ "llm.model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
68
+ "llm.model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
69
+ "llm.model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
70
+ "llm.model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
71
+ "llm.model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
72
+ "llm.model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
73
+ "llm.model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
74
+ "llm.model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
75
+ "llm.model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
76
+ "llm.model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
77
+ "llm.model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
78
+ "llm.model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
79
+ "llm.model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
80
+ "llm.model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
81
+ "llm.model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
82
+ "llm.model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
83
+ "llm.model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
84
+ "llm.model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
85
+ "llm.model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
86
+ "llm.model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
87
+ "llm.model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
88
+ "llm.model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
89
+ "llm.model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
90
+ "llm.model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
91
+ "llm.model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
92
+ "llm.model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
93
+ "llm.model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
94
+ "llm.model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
95
+ "llm.model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
96
+ "llm.model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
97
+ "llm.model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
98
+ "llm.model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
99
+ "llm.model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
100
+ "llm.model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
101
+ "llm.model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
102
+ "llm.model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
103
+ "llm.model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
104
+ "llm.model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
105
+ "llm.model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
106
+ "llm.model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
107
+ "llm.model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
108
+ "llm.model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
109
+ "llm.model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
110
+ "llm.model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
111
+ "llm.model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
112
+ "llm.model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
113
+ "llm.model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
114
+ "llm.model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
115
+ "llm.model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
116
+ "llm.model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
117
+ "llm.model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
118
+ "llm.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
119
+ "llm.model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
120
+ "llm.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
121
+ "llm.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
122
+ "llm.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
123
+ "llm.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
124
+ "llm.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
125
+ "llm.model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
126
+ "llm.model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
127
+ "llm.model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
128
+ "llm.model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
129
+ "llm.model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
130
+ "llm.model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
131
+ "llm.model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
132
+ "llm.model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
133
+ "llm.model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
134
+ "llm.model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
135
+ "llm.model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
136
+ "llm.model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
137
+ "llm.model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
138
+ "llm.model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
139
+ "llm.model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
140
+ "llm.model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
141
+ "llm.model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
142
+ "llm.model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
143
+ "llm.model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
144
+ "llm.model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
145
+ "llm.model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
146
+ "llm.model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
147
+ "llm.model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
148
+ "llm.model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
149
+ "llm.model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
150
+ "llm.model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
151
+ "llm.model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
152
+ "llm.model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
153
+ "llm.model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
154
+ "llm.model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
155
+ "llm.model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
156
+ "llm.model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
157
+ "llm.model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
158
+ "llm.model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
159
+ "llm.model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
160
+ "llm.model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
161
+ "llm.model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
162
+ "llm.model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
163
+ "llm.model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
164
+ "llm.model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
165
+ "llm.model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
166
+ "llm.model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
167
+ "llm.model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
168
+ "llm.model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
169
+ "llm.model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
170
+ "llm.model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
171
+ "llm.model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
172
+ "llm.model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
173
+ "llm.model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
174
+ "llm.model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
175
+ "llm.model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
176
+ "llm.model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
177
+ "llm.model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
178
+ "llm.model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
179
+ "llm.model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
180
+ "llm.model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
181
+ "llm.model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
182
+ "llm.model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
183
+ "llm.model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
184
+ "llm.model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
185
+ "llm.model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
186
+ "llm.model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
187
+ "llm.model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
188
+ "llm.model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
189
+ "llm.model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
190
+ "llm.model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
191
+ "llm.model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
192
+ "llm.model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
193
+ "llm.model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
194
+ "llm.model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
195
+ "llm.model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
196
+ "llm.model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
197
+ "llm.model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
198
+ "llm.model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
199
+ "llm.model.layers.28.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
200
+ "llm.model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
201
+ "llm.model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
202
+ "llm.model.layers.28.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
203
+ "llm.model.layers.28.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
204
+ "llm.model.layers.28.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
205
+ "llm.model.layers.28.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
206
+ "llm.model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
207
+ "llm.model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
208
+ "llm.model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
209
+ "llm.model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
210
+ "llm.model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
211
+ "llm.model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
212
+ "llm.model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
213
+ "llm.model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
214
+ "llm.model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
215
+ "llm.model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
216
+ "llm.model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
217
+ "llm.model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
218
+ "llm.model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
219
+ "llm.model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
220
+ "llm.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
221
+ "llm.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
222
+ "llm.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
223
+ "llm.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
224
+ "llm.model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
225
+ "llm.model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
226
+ "llm.model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
227
+ "llm.model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
228
+ "llm.model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
229
+ "llm.model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
230
+ "llm.model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
231
+ "llm.model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
232
+ "llm.model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
233
+ "llm.model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
234
+ "llm.model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
235
+ "llm.model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
236
+ "llm.model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
237
+ "llm.model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
238
+ "llm.model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
239
+ "llm.model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
240
+ "llm.model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
241
+ "llm.model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
242
+ "llm.model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
243
+ "llm.model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
244
+ "llm.model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
245
+ "llm.model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
246
+ "llm.model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
247
+ "llm.model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
248
+ "llm.model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
249
+ "llm.model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
250
+ "llm.model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
251
+ "llm.model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
252
+ "llm.model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
253
+ "llm.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
254
+ "llm.model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
255
+ "llm.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
256
+ "llm.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
257
+ "llm.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
258
+ "llm.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
259
+ "llm.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
260
+ "llm.model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
261
+ "llm.model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
262
+ "llm.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
263
+ "llm.model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
264
+ "llm.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
265
+ "llm.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
266
+ "llm.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
267
+ "llm.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
268
+ "llm.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
269
+ "llm.model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
270
+ "llm.model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
271
+ "llm.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
272
+ "llm.model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
273
+ "llm.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
274
+ "llm.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
275
+ "llm.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
276
+ "llm.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
277
+ "llm.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
278
+ "llm.model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
279
+ "llm.model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
280
+ "llm.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
281
+ "llm.model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
282
+ "llm.model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
283
+ "llm.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
284
+ "llm.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
285
+ "llm.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
286
+ "llm.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
287
+ "llm.model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
288
+ "llm.model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
289
+ "llm.model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
290
+ "llm.model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
291
+ "llm.model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
292
+ "llm.model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
293
+ "llm.model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
294
+ "llm.model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
295
+ "llm.model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
296
+ "llm.model.norm.weight": "model-00007-of-00007.safetensors",
297
+ "resampler.attn.in_proj_bias": "model-00007-of-00007.safetensors",
298
+ "resampler.attn.in_proj_weight": "model-00007-of-00007.safetensors",
299
+ "resampler.attn.out_proj.bias": "model-00007-of-00007.safetensors",
300
+ "resampler.attn.out_proj.weight": "model-00007-of-00007.safetensors",
301
+ "resampler.kv_proj.weight": "model-00007-of-00007.safetensors",
302
+ "resampler.ln_kv.bias": "model-00007-of-00007.safetensors",
303
+ "resampler.ln_kv.weight": "model-00007-of-00007.safetensors",
304
+ "resampler.ln_post.bias": "model-00007-of-00007.safetensors",
305
+ "resampler.ln_post.weight": "model-00007-of-00007.safetensors",
306
+ "resampler.ln_q.bias": "model-00007-of-00007.safetensors",
307
+ "resampler.ln_q.weight": "model-00007-of-00007.safetensors",
308
+ "resampler.proj": "model-00007-of-00007.safetensors",
309
+ "resampler.query": "model-00007-of-00007.safetensors",
310
+ "vpm.embeddings.patch_embedding.bias": "model-00007-of-00007.safetensors",
311
+ "vpm.embeddings.patch_embedding.weight": "model-00007-of-00007.safetensors",
312
+ "vpm.embeddings.position_embedding.weight": "model-00007-of-00007.safetensors",
313
+ "vpm.encoder.layers.0.layer_norm1.bias": "model-00007-of-00007.safetensors",
314
+ "vpm.encoder.layers.0.layer_norm1.weight": "model-00007-of-00007.safetensors",
315
+ "vpm.encoder.layers.0.layer_norm2.bias": "model-00007-of-00007.safetensors",
316
+ "vpm.encoder.layers.0.layer_norm2.weight": "model-00007-of-00007.safetensors",
317
+ "vpm.encoder.layers.0.mlp.fc1.bias": "model-00007-of-00007.safetensors",
318
+ "vpm.encoder.layers.0.mlp.fc1.weight": "model-00007-of-00007.safetensors",
319
+ "vpm.encoder.layers.0.mlp.fc2.bias": "model-00007-of-00007.safetensors",
320
+ "vpm.encoder.layers.0.mlp.fc2.weight": "model-00007-of-00007.safetensors",
321
+ "vpm.encoder.layers.0.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
322
+ "vpm.encoder.layers.0.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
323
+ "vpm.encoder.layers.0.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
324
+ "vpm.encoder.layers.0.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
325
+ "vpm.encoder.layers.0.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
326
+ "vpm.encoder.layers.0.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
327
+ "vpm.encoder.layers.0.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
328
+ "vpm.encoder.layers.0.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
329
+ "vpm.encoder.layers.1.layer_norm1.bias": "model-00007-of-00007.safetensors",
330
+ "vpm.encoder.layers.1.layer_norm1.weight": "model-00007-of-00007.safetensors",
331
+ "vpm.encoder.layers.1.layer_norm2.bias": "model-00007-of-00007.safetensors",
332
+ "vpm.encoder.layers.1.layer_norm2.weight": "model-00007-of-00007.safetensors",
333
+ "vpm.encoder.layers.1.mlp.fc1.bias": "model-00007-of-00007.safetensors",
334
+ "vpm.encoder.layers.1.mlp.fc1.weight": "model-00007-of-00007.safetensors",
335
+ "vpm.encoder.layers.1.mlp.fc2.bias": "model-00007-of-00007.safetensors",
336
+ "vpm.encoder.layers.1.mlp.fc2.weight": "model-00007-of-00007.safetensors",
337
+ "vpm.encoder.layers.1.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
338
+ "vpm.encoder.layers.1.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
339
+ "vpm.encoder.layers.1.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
340
+ "vpm.encoder.layers.1.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
341
+ "vpm.encoder.layers.1.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
342
+ "vpm.encoder.layers.1.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
343
+ "vpm.encoder.layers.1.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
344
+ "vpm.encoder.layers.1.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
345
+ "vpm.encoder.layers.10.layer_norm1.bias": "model-00007-of-00007.safetensors",
346
+ "vpm.encoder.layers.10.layer_norm1.weight": "model-00007-of-00007.safetensors",
347
+ "vpm.encoder.layers.10.layer_norm2.bias": "model-00007-of-00007.safetensors",
348
+ "vpm.encoder.layers.10.layer_norm2.weight": "model-00007-of-00007.safetensors",
349
+ "vpm.encoder.layers.10.mlp.fc1.bias": "model-00007-of-00007.safetensors",
350
+ "vpm.encoder.layers.10.mlp.fc1.weight": "model-00007-of-00007.safetensors",
351
+ "vpm.encoder.layers.10.mlp.fc2.bias": "model-00007-of-00007.safetensors",
352
+ "vpm.encoder.layers.10.mlp.fc2.weight": "model-00007-of-00007.safetensors",
353
+ "vpm.encoder.layers.10.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
354
+ "vpm.encoder.layers.10.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
355
+ "vpm.encoder.layers.10.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
356
+ "vpm.encoder.layers.10.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
357
+ "vpm.encoder.layers.10.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
358
+ "vpm.encoder.layers.10.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
359
+ "vpm.encoder.layers.10.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
360
+ "vpm.encoder.layers.10.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
361
+ "vpm.encoder.layers.11.layer_norm1.bias": "model-00007-of-00007.safetensors",
362
+ "vpm.encoder.layers.11.layer_norm1.weight": "model-00007-of-00007.safetensors",
363
+ "vpm.encoder.layers.11.layer_norm2.bias": "model-00007-of-00007.safetensors",
364
+ "vpm.encoder.layers.11.layer_norm2.weight": "model-00007-of-00007.safetensors",
365
+ "vpm.encoder.layers.11.mlp.fc1.bias": "model-00007-of-00007.safetensors",
366
+ "vpm.encoder.layers.11.mlp.fc1.weight": "model-00007-of-00007.safetensors",
367
+ "vpm.encoder.layers.11.mlp.fc2.bias": "model-00007-of-00007.safetensors",
368
+ "vpm.encoder.layers.11.mlp.fc2.weight": "model-00007-of-00007.safetensors",
369
+ "vpm.encoder.layers.11.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
370
+ "vpm.encoder.layers.11.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
371
+ "vpm.encoder.layers.11.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
372
+ "vpm.encoder.layers.11.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
373
+ "vpm.encoder.layers.11.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
374
+ "vpm.encoder.layers.11.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
375
+ "vpm.encoder.layers.11.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
376
+ "vpm.encoder.layers.11.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
377
+ "vpm.encoder.layers.12.layer_norm1.bias": "model-00007-of-00007.safetensors",
378
+ "vpm.encoder.layers.12.layer_norm1.weight": "model-00007-of-00007.safetensors",
379
+ "vpm.encoder.layers.12.layer_norm2.bias": "model-00007-of-00007.safetensors",
380
+ "vpm.encoder.layers.12.layer_norm2.weight": "model-00007-of-00007.safetensors",
381
+ "vpm.encoder.layers.12.mlp.fc1.bias": "model-00007-of-00007.safetensors",
382
+ "vpm.encoder.layers.12.mlp.fc1.weight": "model-00007-of-00007.safetensors",
383
+ "vpm.encoder.layers.12.mlp.fc2.bias": "model-00007-of-00007.safetensors",
384
+ "vpm.encoder.layers.12.mlp.fc2.weight": "model-00007-of-00007.safetensors",
385
+ "vpm.encoder.layers.12.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
386
+ "vpm.encoder.layers.12.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
387
+ "vpm.encoder.layers.12.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
388
+ "vpm.encoder.layers.12.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
389
+ "vpm.encoder.layers.12.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
390
+ "vpm.encoder.layers.12.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
391
+ "vpm.encoder.layers.12.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
392
+ "vpm.encoder.layers.12.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
393
+ "vpm.encoder.layers.13.layer_norm1.bias": "model-00007-of-00007.safetensors",
394
+ "vpm.encoder.layers.13.layer_norm1.weight": "model-00007-of-00007.safetensors",
395
+ "vpm.encoder.layers.13.layer_norm2.bias": "model-00007-of-00007.safetensors",
396
+ "vpm.encoder.layers.13.layer_norm2.weight": "model-00007-of-00007.safetensors",
397
+ "vpm.encoder.layers.13.mlp.fc1.bias": "model-00007-of-00007.safetensors",
398
+ "vpm.encoder.layers.13.mlp.fc1.weight": "model-00007-of-00007.safetensors",
399
+ "vpm.encoder.layers.13.mlp.fc2.bias": "model-00007-of-00007.safetensors",
400
+ "vpm.encoder.layers.13.mlp.fc2.weight": "model-00007-of-00007.safetensors",
401
+ "vpm.encoder.layers.13.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
402
+ "vpm.encoder.layers.13.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
403
+ "vpm.encoder.layers.13.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
404
+ "vpm.encoder.layers.13.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
405
+ "vpm.encoder.layers.13.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
406
+ "vpm.encoder.layers.13.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
407
+ "vpm.encoder.layers.13.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
408
+ "vpm.encoder.layers.13.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
409
+ "vpm.encoder.layers.14.layer_norm1.bias": "model-00007-of-00007.safetensors",
410
+ "vpm.encoder.layers.14.layer_norm1.weight": "model-00007-of-00007.safetensors",
411
+ "vpm.encoder.layers.14.layer_norm2.bias": "model-00007-of-00007.safetensors",
412
+ "vpm.encoder.layers.14.layer_norm2.weight": "model-00007-of-00007.safetensors",
413
+ "vpm.encoder.layers.14.mlp.fc1.bias": "model-00007-of-00007.safetensors",
414
+ "vpm.encoder.layers.14.mlp.fc1.weight": "model-00007-of-00007.safetensors",
415
+ "vpm.encoder.layers.14.mlp.fc2.bias": "model-00007-of-00007.safetensors",
416
+ "vpm.encoder.layers.14.mlp.fc2.weight": "model-00007-of-00007.safetensors",
417
+ "vpm.encoder.layers.14.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
418
+ "vpm.encoder.layers.14.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
419
+ "vpm.encoder.layers.14.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
420
+ "vpm.encoder.layers.14.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
421
+ "vpm.encoder.layers.14.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
422
+ "vpm.encoder.layers.14.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
423
+ "vpm.encoder.layers.14.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
424
+ "vpm.encoder.layers.14.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
425
+ "vpm.encoder.layers.15.layer_norm1.bias": "model-00007-of-00007.safetensors",
426
+ "vpm.encoder.layers.15.layer_norm1.weight": "model-00007-of-00007.safetensors",
427
+ "vpm.encoder.layers.15.layer_norm2.bias": "model-00007-of-00007.safetensors",
428
+ "vpm.encoder.layers.15.layer_norm2.weight": "model-00007-of-00007.safetensors",
429
+ "vpm.encoder.layers.15.mlp.fc1.bias": "model-00007-of-00007.safetensors",
430
+ "vpm.encoder.layers.15.mlp.fc1.weight": "model-00007-of-00007.safetensors",
431
+ "vpm.encoder.layers.15.mlp.fc2.bias": "model-00007-of-00007.safetensors",
432
+ "vpm.encoder.layers.15.mlp.fc2.weight": "model-00007-of-00007.safetensors",
433
+ "vpm.encoder.layers.15.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
434
+ "vpm.encoder.layers.15.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
435
+ "vpm.encoder.layers.15.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
436
+ "vpm.encoder.layers.15.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
437
+ "vpm.encoder.layers.15.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
438
+ "vpm.encoder.layers.15.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
439
+ "vpm.encoder.layers.15.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
440
+ "vpm.encoder.layers.15.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
441
+ "vpm.encoder.layers.16.layer_norm1.bias": "model-00007-of-00007.safetensors",
442
+ "vpm.encoder.layers.16.layer_norm1.weight": "model-00007-of-00007.safetensors",
443
+ "vpm.encoder.layers.16.layer_norm2.bias": "model-00007-of-00007.safetensors",
444
+ "vpm.encoder.layers.16.layer_norm2.weight": "model-00007-of-00007.safetensors",
445
+ "vpm.encoder.layers.16.mlp.fc1.bias": "model-00007-of-00007.safetensors",
446
+ "vpm.encoder.layers.16.mlp.fc1.weight": "model-00007-of-00007.safetensors",
447
+ "vpm.encoder.layers.16.mlp.fc2.bias": "model-00007-of-00007.safetensors",
448
+ "vpm.encoder.layers.16.mlp.fc2.weight": "model-00007-of-00007.safetensors",
449
+ "vpm.encoder.layers.16.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
450
+ "vpm.encoder.layers.16.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
451
+ "vpm.encoder.layers.16.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
452
+ "vpm.encoder.layers.16.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
453
+ "vpm.encoder.layers.16.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
454
+ "vpm.encoder.layers.16.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
455
+ "vpm.encoder.layers.16.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
456
+ "vpm.encoder.layers.16.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
457
+ "vpm.encoder.layers.17.layer_norm1.bias": "model-00007-of-00007.safetensors",
458
+ "vpm.encoder.layers.17.layer_norm1.weight": "model-00007-of-00007.safetensors",
459
+ "vpm.encoder.layers.17.layer_norm2.bias": "model-00007-of-00007.safetensors",
460
+ "vpm.encoder.layers.17.layer_norm2.weight": "model-00007-of-00007.safetensors",
461
+ "vpm.encoder.layers.17.mlp.fc1.bias": "model-00007-of-00007.safetensors",
462
+ "vpm.encoder.layers.17.mlp.fc1.weight": "model-00007-of-00007.safetensors",
463
+ "vpm.encoder.layers.17.mlp.fc2.bias": "model-00007-of-00007.safetensors",
464
+ "vpm.encoder.layers.17.mlp.fc2.weight": "model-00007-of-00007.safetensors",
465
+ "vpm.encoder.layers.17.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
466
+ "vpm.encoder.layers.17.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
467
+ "vpm.encoder.layers.17.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
468
+ "vpm.encoder.layers.17.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
469
+ "vpm.encoder.layers.17.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
470
+ "vpm.encoder.layers.17.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
471
+ "vpm.encoder.layers.17.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
472
+ "vpm.encoder.layers.17.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
473
+ "vpm.encoder.layers.18.layer_norm1.bias": "model-00007-of-00007.safetensors",
474
+ "vpm.encoder.layers.18.layer_norm1.weight": "model-00007-of-00007.safetensors",
475
+ "vpm.encoder.layers.18.layer_norm2.bias": "model-00007-of-00007.safetensors",
476
+ "vpm.encoder.layers.18.layer_norm2.weight": "model-00007-of-00007.safetensors",
477
+ "vpm.encoder.layers.18.mlp.fc1.bias": "model-00007-of-00007.safetensors",
478
+ "vpm.encoder.layers.18.mlp.fc1.weight": "model-00007-of-00007.safetensors",
479
+ "vpm.encoder.layers.18.mlp.fc2.bias": "model-00007-of-00007.safetensors",
480
+ "vpm.encoder.layers.18.mlp.fc2.weight": "model-00007-of-00007.safetensors",
481
+ "vpm.encoder.layers.18.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
482
+ "vpm.encoder.layers.18.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
483
+ "vpm.encoder.layers.18.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
484
+ "vpm.encoder.layers.18.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
485
+ "vpm.encoder.layers.18.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
486
+ "vpm.encoder.layers.18.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
487
+ "vpm.encoder.layers.18.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
488
+ "vpm.encoder.layers.18.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
489
+ "vpm.encoder.layers.19.layer_norm1.bias": "model-00007-of-00007.safetensors",
490
+ "vpm.encoder.layers.19.layer_norm1.weight": "model-00007-of-00007.safetensors",
491
+ "vpm.encoder.layers.19.layer_norm2.bias": "model-00007-of-00007.safetensors",
492
+ "vpm.encoder.layers.19.layer_norm2.weight": "model-00007-of-00007.safetensors",
493
+ "vpm.encoder.layers.19.mlp.fc1.bias": "model-00007-of-00007.safetensors",
494
+ "vpm.encoder.layers.19.mlp.fc1.weight": "model-00007-of-00007.safetensors",
495
+ "vpm.encoder.layers.19.mlp.fc2.bias": "model-00007-of-00007.safetensors",
496
+ "vpm.encoder.layers.19.mlp.fc2.weight": "model-00007-of-00007.safetensors",
497
+ "vpm.encoder.layers.19.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
498
+ "vpm.encoder.layers.19.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
499
+ "vpm.encoder.layers.19.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
500
+ "vpm.encoder.layers.19.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
501
+ "vpm.encoder.layers.19.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
502
+ "vpm.encoder.layers.19.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
503
+ "vpm.encoder.layers.19.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
504
+ "vpm.encoder.layers.19.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
505
+ "vpm.encoder.layers.2.layer_norm1.bias": "model-00007-of-00007.safetensors",
506
+ "vpm.encoder.layers.2.layer_norm1.weight": "model-00007-of-00007.safetensors",
507
+ "vpm.encoder.layers.2.layer_norm2.bias": "model-00007-of-00007.safetensors",
508
+ "vpm.encoder.layers.2.layer_norm2.weight": "model-00007-of-00007.safetensors",
509
+ "vpm.encoder.layers.2.mlp.fc1.bias": "model-00007-of-00007.safetensors",
510
+ "vpm.encoder.layers.2.mlp.fc1.weight": "model-00007-of-00007.safetensors",
511
+ "vpm.encoder.layers.2.mlp.fc2.bias": "model-00007-of-00007.safetensors",
512
+ "vpm.encoder.layers.2.mlp.fc2.weight": "model-00007-of-00007.safetensors",
513
+ "vpm.encoder.layers.2.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
514
+ "vpm.encoder.layers.2.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
515
+ "vpm.encoder.layers.2.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
516
+ "vpm.encoder.layers.2.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
517
+ "vpm.encoder.layers.2.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
518
+ "vpm.encoder.layers.2.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
519
+ "vpm.encoder.layers.2.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
520
+ "vpm.encoder.layers.2.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
521
+ "vpm.encoder.layers.20.layer_norm1.bias": "model-00007-of-00007.safetensors",
522
+ "vpm.encoder.layers.20.layer_norm1.weight": "model-00007-of-00007.safetensors",
523
+ "vpm.encoder.layers.20.layer_norm2.bias": "model-00007-of-00007.safetensors",
524
+ "vpm.encoder.layers.20.layer_norm2.weight": "model-00007-of-00007.safetensors",
525
+ "vpm.encoder.layers.20.mlp.fc1.bias": "model-00007-of-00007.safetensors",
526
+ "vpm.encoder.layers.20.mlp.fc1.weight": "model-00007-of-00007.safetensors",
527
+ "vpm.encoder.layers.20.mlp.fc2.bias": "model-00007-of-00007.safetensors",
528
+ "vpm.encoder.layers.20.mlp.fc2.weight": "model-00007-of-00007.safetensors",
529
+ "vpm.encoder.layers.20.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
530
+ "vpm.encoder.layers.20.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
531
+ "vpm.encoder.layers.20.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
532
+ "vpm.encoder.layers.20.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
533
+ "vpm.encoder.layers.20.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
534
+ "vpm.encoder.layers.20.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
535
+ "vpm.encoder.layers.20.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
536
+ "vpm.encoder.layers.20.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
537
+ "vpm.encoder.layers.21.layer_norm1.bias": "model-00007-of-00007.safetensors",
538
+ "vpm.encoder.layers.21.layer_norm1.weight": "model-00007-of-00007.safetensors",
539
+ "vpm.encoder.layers.21.layer_norm2.bias": "model-00007-of-00007.safetensors",
540
+ "vpm.encoder.layers.21.layer_norm2.weight": "model-00007-of-00007.safetensors",
541
+ "vpm.encoder.layers.21.mlp.fc1.bias": "model-00007-of-00007.safetensors",
542
+ "vpm.encoder.layers.21.mlp.fc1.weight": "model-00007-of-00007.safetensors",
543
+ "vpm.encoder.layers.21.mlp.fc2.bias": "model-00007-of-00007.safetensors",
544
+ "vpm.encoder.layers.21.mlp.fc2.weight": "model-00007-of-00007.safetensors",
545
+ "vpm.encoder.layers.21.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
546
+ "vpm.encoder.layers.21.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
547
+ "vpm.encoder.layers.21.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
548
+ "vpm.encoder.layers.21.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
549
+ "vpm.encoder.layers.21.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
550
+ "vpm.encoder.layers.21.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
551
+ "vpm.encoder.layers.21.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
552
+ "vpm.encoder.layers.21.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
553
+ "vpm.encoder.layers.22.layer_norm1.bias": "model-00007-of-00007.safetensors",
554
+ "vpm.encoder.layers.22.layer_norm1.weight": "model-00007-of-00007.safetensors",
555
+ "vpm.encoder.layers.22.layer_norm2.bias": "model-00007-of-00007.safetensors",
556
+ "vpm.encoder.layers.22.layer_norm2.weight": "model-00007-of-00007.safetensors",
557
+ "vpm.encoder.layers.22.mlp.fc1.bias": "model-00007-of-00007.safetensors",
558
+ "vpm.encoder.layers.22.mlp.fc1.weight": "model-00007-of-00007.safetensors",
559
+ "vpm.encoder.layers.22.mlp.fc2.bias": "model-00007-of-00007.safetensors",
560
+ "vpm.encoder.layers.22.mlp.fc2.weight": "model-00007-of-00007.safetensors",
561
+ "vpm.encoder.layers.22.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
562
+ "vpm.encoder.layers.22.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
563
+ "vpm.encoder.layers.22.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
564
+ "vpm.encoder.layers.22.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
565
+ "vpm.encoder.layers.22.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
566
+ "vpm.encoder.layers.22.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
567
+ "vpm.encoder.layers.22.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
568
+ "vpm.encoder.layers.22.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
569
+ "vpm.encoder.layers.23.layer_norm1.bias": "model-00007-of-00007.safetensors",
570
+ "vpm.encoder.layers.23.layer_norm1.weight": "model-00007-of-00007.safetensors",
571
+ "vpm.encoder.layers.23.layer_norm2.bias": "model-00007-of-00007.safetensors",
572
+ "vpm.encoder.layers.23.layer_norm2.weight": "model-00007-of-00007.safetensors",
573
+ "vpm.encoder.layers.23.mlp.fc1.bias": "model-00007-of-00007.safetensors",
574
+ "vpm.encoder.layers.23.mlp.fc1.weight": "model-00007-of-00007.safetensors",
575
+ "vpm.encoder.layers.23.mlp.fc2.bias": "model-00007-of-00007.safetensors",
576
+ "vpm.encoder.layers.23.mlp.fc2.weight": "model-00007-of-00007.safetensors",
577
+ "vpm.encoder.layers.23.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
578
+ "vpm.encoder.layers.23.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
579
+ "vpm.encoder.layers.23.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
580
+ "vpm.encoder.layers.23.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
581
+ "vpm.encoder.layers.23.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
582
+ "vpm.encoder.layers.23.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
583
+ "vpm.encoder.layers.23.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
584
+ "vpm.encoder.layers.23.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
585
+ "vpm.encoder.layers.24.layer_norm1.bias": "model-00007-of-00007.safetensors",
586
+ "vpm.encoder.layers.24.layer_norm1.weight": "model-00007-of-00007.safetensors",
587
+ "vpm.encoder.layers.24.layer_norm2.bias": "model-00007-of-00007.safetensors",
588
+ "vpm.encoder.layers.24.layer_norm2.weight": "model-00007-of-00007.safetensors",
589
+ "vpm.encoder.layers.24.mlp.fc1.bias": "model-00007-of-00007.safetensors",
590
+ "vpm.encoder.layers.24.mlp.fc1.weight": "model-00007-of-00007.safetensors",
591
+ "vpm.encoder.layers.24.mlp.fc2.bias": "model-00007-of-00007.safetensors",
592
+ "vpm.encoder.layers.24.mlp.fc2.weight": "model-00007-of-00007.safetensors",
593
+ "vpm.encoder.layers.24.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
594
+ "vpm.encoder.layers.24.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
595
+ "vpm.encoder.layers.24.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
596
+ "vpm.encoder.layers.24.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
597
+ "vpm.encoder.layers.24.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
598
+ "vpm.encoder.layers.24.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
599
+ "vpm.encoder.layers.24.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
600
+ "vpm.encoder.layers.24.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
601
+ "vpm.encoder.layers.25.layer_norm1.bias": "model-00007-of-00007.safetensors",
602
+ "vpm.encoder.layers.25.layer_norm1.weight": "model-00007-of-00007.safetensors",
603
+ "vpm.encoder.layers.25.layer_norm2.bias": "model-00007-of-00007.safetensors",
604
+ "vpm.encoder.layers.25.layer_norm2.weight": "model-00007-of-00007.safetensors",
605
+ "vpm.encoder.layers.25.mlp.fc1.bias": "model-00007-of-00007.safetensors",
606
+ "vpm.encoder.layers.25.mlp.fc1.weight": "model-00007-of-00007.safetensors",
607
+ "vpm.encoder.layers.25.mlp.fc2.bias": "model-00007-of-00007.safetensors",
608
+ "vpm.encoder.layers.25.mlp.fc2.weight": "model-00007-of-00007.safetensors",
609
+ "vpm.encoder.layers.25.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
610
+ "vpm.encoder.layers.25.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
611
+ "vpm.encoder.layers.25.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
612
+ "vpm.encoder.layers.25.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
613
+ "vpm.encoder.layers.25.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
614
+ "vpm.encoder.layers.25.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
615
+ "vpm.encoder.layers.25.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
616
+ "vpm.encoder.layers.25.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
617
+ "vpm.encoder.layers.26.layer_norm1.bias": "model-00007-of-00007.safetensors",
618
+ "vpm.encoder.layers.26.layer_norm1.weight": "model-00007-of-00007.safetensors",
619
+ "vpm.encoder.layers.26.layer_norm2.bias": "model-00007-of-00007.safetensors",
620
+ "vpm.encoder.layers.26.layer_norm2.weight": "model-00007-of-00007.safetensors",
621
+ "vpm.encoder.layers.26.mlp.fc1.bias": "model-00007-of-00007.safetensors",
622
+ "vpm.encoder.layers.26.mlp.fc1.weight": "model-00007-of-00007.safetensors",
623
+ "vpm.encoder.layers.26.mlp.fc2.bias": "model-00007-of-00007.safetensors",
624
+ "vpm.encoder.layers.26.mlp.fc2.weight": "model-00007-of-00007.safetensors",
625
+ "vpm.encoder.layers.26.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
626
+ "vpm.encoder.layers.26.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
627
+ "vpm.encoder.layers.26.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
628
+ "vpm.encoder.layers.26.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
629
+ "vpm.encoder.layers.26.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
630
+ "vpm.encoder.layers.26.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
631
+ "vpm.encoder.layers.26.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
632
+ "vpm.encoder.layers.26.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
633
+ "vpm.encoder.layers.3.layer_norm1.bias": "model-00007-of-00007.safetensors",
634
+ "vpm.encoder.layers.3.layer_norm1.weight": "model-00007-of-00007.safetensors",
635
+ "vpm.encoder.layers.3.layer_norm2.bias": "model-00007-of-00007.safetensors",
636
+ "vpm.encoder.layers.3.layer_norm2.weight": "model-00007-of-00007.safetensors",
637
+ "vpm.encoder.layers.3.mlp.fc1.bias": "model-00007-of-00007.safetensors",
638
+ "vpm.encoder.layers.3.mlp.fc1.weight": "model-00007-of-00007.safetensors",
639
+ "vpm.encoder.layers.3.mlp.fc2.bias": "model-00007-of-00007.safetensors",
640
+ "vpm.encoder.layers.3.mlp.fc2.weight": "model-00007-of-00007.safetensors",
641
+ "vpm.encoder.layers.3.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
642
+ "vpm.encoder.layers.3.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
643
+ "vpm.encoder.layers.3.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
644
+ "vpm.encoder.layers.3.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
645
+ "vpm.encoder.layers.3.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
646
+ "vpm.encoder.layers.3.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
647
+ "vpm.encoder.layers.3.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
648
+ "vpm.encoder.layers.3.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
649
+ "vpm.encoder.layers.4.layer_norm1.bias": "model-00007-of-00007.safetensors",
650
+ "vpm.encoder.layers.4.layer_norm1.weight": "model-00007-of-00007.safetensors",
651
+ "vpm.encoder.layers.4.layer_norm2.bias": "model-00007-of-00007.safetensors",
652
+ "vpm.encoder.layers.4.layer_norm2.weight": "model-00007-of-00007.safetensors",
653
+ "vpm.encoder.layers.4.mlp.fc1.bias": "model-00007-of-00007.safetensors",
654
+ "vpm.encoder.layers.4.mlp.fc1.weight": "model-00007-of-00007.safetensors",
655
+ "vpm.encoder.layers.4.mlp.fc2.bias": "model-00007-of-00007.safetensors",
656
+ "vpm.encoder.layers.4.mlp.fc2.weight": "model-00007-of-00007.safetensors",
657
+ "vpm.encoder.layers.4.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
658
+ "vpm.encoder.layers.4.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
659
+ "vpm.encoder.layers.4.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
660
+ "vpm.encoder.layers.4.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
661
+ "vpm.encoder.layers.4.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
662
+ "vpm.encoder.layers.4.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
663
+ "vpm.encoder.layers.4.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
664
+ "vpm.encoder.layers.4.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
665
+ "vpm.encoder.layers.5.layer_norm1.bias": "model-00007-of-00007.safetensors",
666
+ "vpm.encoder.layers.5.layer_norm1.weight": "model-00007-of-00007.safetensors",
667
+ "vpm.encoder.layers.5.layer_norm2.bias": "model-00007-of-00007.safetensors",
668
+ "vpm.encoder.layers.5.layer_norm2.weight": "model-00007-of-00007.safetensors",
669
+ "vpm.encoder.layers.5.mlp.fc1.bias": "model-00007-of-00007.safetensors",
670
+ "vpm.encoder.layers.5.mlp.fc1.weight": "model-00007-of-00007.safetensors",
671
+ "vpm.encoder.layers.5.mlp.fc2.bias": "model-00007-of-00007.safetensors",
672
+ "vpm.encoder.layers.5.mlp.fc2.weight": "model-00007-of-00007.safetensors",
673
+ "vpm.encoder.layers.5.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
674
+ "vpm.encoder.layers.5.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
675
+ "vpm.encoder.layers.5.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
676
+ "vpm.encoder.layers.5.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
677
+ "vpm.encoder.layers.5.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
678
+ "vpm.encoder.layers.5.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
679
+ "vpm.encoder.layers.5.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
680
+ "vpm.encoder.layers.5.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
681
+ "vpm.encoder.layers.6.layer_norm1.bias": "model-00007-of-00007.safetensors",
682
+ "vpm.encoder.layers.6.layer_norm1.weight": "model-00007-of-00007.safetensors",
683
+ "vpm.encoder.layers.6.layer_norm2.bias": "model-00007-of-00007.safetensors",
684
+ "vpm.encoder.layers.6.layer_norm2.weight": "model-00007-of-00007.safetensors",
685
+ "vpm.encoder.layers.6.mlp.fc1.bias": "model-00007-of-00007.safetensors",
686
+ "vpm.encoder.layers.6.mlp.fc1.weight": "model-00007-of-00007.safetensors",
687
+ "vpm.encoder.layers.6.mlp.fc2.bias": "model-00007-of-00007.safetensors",
688
+ "vpm.encoder.layers.6.mlp.fc2.weight": "model-00007-of-00007.safetensors",
689
+ "vpm.encoder.layers.6.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
690
+ "vpm.encoder.layers.6.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
691
+ "vpm.encoder.layers.6.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
692
+ "vpm.encoder.layers.6.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
693
+ "vpm.encoder.layers.6.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
694
+ "vpm.encoder.layers.6.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
695
+ "vpm.encoder.layers.6.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
696
+ "vpm.encoder.layers.6.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
697
+ "vpm.encoder.layers.7.layer_norm1.bias": "model-00007-of-00007.safetensors",
698
+ "vpm.encoder.layers.7.layer_norm1.weight": "model-00007-of-00007.safetensors",
699
+ "vpm.encoder.layers.7.layer_norm2.bias": "model-00007-of-00007.safetensors",
700
+ "vpm.encoder.layers.7.layer_norm2.weight": "model-00007-of-00007.safetensors",
701
+ "vpm.encoder.layers.7.mlp.fc1.bias": "model-00007-of-00007.safetensors",
702
+ "vpm.encoder.layers.7.mlp.fc1.weight": "model-00007-of-00007.safetensors",
703
+ "vpm.encoder.layers.7.mlp.fc2.bias": "model-00007-of-00007.safetensors",
704
+ "vpm.encoder.layers.7.mlp.fc2.weight": "model-00007-of-00007.safetensors",
705
+ "vpm.encoder.layers.7.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
706
+ "vpm.encoder.layers.7.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
707
+ "vpm.encoder.layers.7.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
708
+ "vpm.encoder.layers.7.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
709
+ "vpm.encoder.layers.7.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
710
+ "vpm.encoder.layers.7.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
711
+ "vpm.encoder.layers.7.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
712
+ "vpm.encoder.layers.7.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
713
+ "vpm.encoder.layers.8.layer_norm1.bias": "model-00007-of-00007.safetensors",
714
+ "vpm.encoder.layers.8.layer_norm1.weight": "model-00007-of-00007.safetensors",
715
+ "vpm.encoder.layers.8.layer_norm2.bias": "model-00007-of-00007.safetensors",
716
+ "vpm.encoder.layers.8.layer_norm2.weight": "model-00007-of-00007.safetensors",
717
+ "vpm.encoder.layers.8.mlp.fc1.bias": "model-00007-of-00007.safetensors",
718
+ "vpm.encoder.layers.8.mlp.fc1.weight": "model-00007-of-00007.safetensors",
719
+ "vpm.encoder.layers.8.mlp.fc2.bias": "model-00007-of-00007.safetensors",
720
+ "vpm.encoder.layers.8.mlp.fc2.weight": "model-00007-of-00007.safetensors",
721
+ "vpm.encoder.layers.8.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
722
+ "vpm.encoder.layers.8.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
723
+ "vpm.encoder.layers.8.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
724
+ "vpm.encoder.layers.8.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
725
+ "vpm.encoder.layers.8.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
726
+ "vpm.encoder.layers.8.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
727
+ "vpm.encoder.layers.8.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
728
+ "vpm.encoder.layers.8.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
729
+ "vpm.encoder.layers.9.layer_norm1.bias": "model-00007-of-00007.safetensors",
730
+ "vpm.encoder.layers.9.layer_norm1.weight": "model-00007-of-00007.safetensors",
731
+ "vpm.encoder.layers.9.layer_norm2.bias": "model-00007-of-00007.safetensors",
732
+ "vpm.encoder.layers.9.layer_norm2.weight": "model-00007-of-00007.safetensors",
733
+ "vpm.encoder.layers.9.mlp.fc1.bias": "model-00007-of-00007.safetensors",
734
+ "vpm.encoder.layers.9.mlp.fc1.weight": "model-00007-of-00007.safetensors",
735
+ "vpm.encoder.layers.9.mlp.fc2.bias": "model-00007-of-00007.safetensors",
736
+ "vpm.encoder.layers.9.mlp.fc2.weight": "model-00007-of-00007.safetensors",
737
+ "vpm.encoder.layers.9.self_attn.k_proj.bias": "model-00007-of-00007.safetensors",
738
+ "vpm.encoder.layers.9.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
739
+ "vpm.encoder.layers.9.self_attn.out_proj.bias": "model-00007-of-00007.safetensors",
740
+ "vpm.encoder.layers.9.self_attn.out_proj.weight": "model-00007-of-00007.safetensors",
741
+ "vpm.encoder.layers.9.self_attn.q_proj.bias": "model-00007-of-00007.safetensors",
742
+ "vpm.encoder.layers.9.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
743
+ "vpm.encoder.layers.9.self_attn.v_proj.bias": "model-00007-of-00007.safetensors",
744
+ "vpm.encoder.layers.9.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
745
+ "vpm.post_layernorm.bias": "model-00007-of-00007.safetensors",
746
+ "vpm.post_layernorm.weight": "model-00007-of-00007.safetensors"
747
+ }
748
+ }
modeling_minicpmv.py ADDED
@@ -0,0 +1,702 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ from typing import List, Optional
3
+ import json
4
+ import torch
5
+ import torchvision
6
+ from threading import Thread
7
+ from copy import deepcopy
8
+ from PIL import Image
9
+ from torchvision import transforms
10
+ from transformers import LlamaTokenizer, LlamaPreTrainedModel, LlamaForCausalLM, AutoModel, PreTrainedTokenizerFast, TextIteratorStreamer
11
+ from transformers.models.idefics2.modeling_idefics2 import Idefics2VisionTransformer
12
+
13
+ from .configuration_minicpm import MiniCPMVConfig
14
+ from .resampler import Resampler
15
+
16
+ IMAGENET_INCEPTION_MEAN = (0.5, 0.5, 0.5) # timm.data.IMAGENET_INCEPTION_MEAN
17
+ IMAGENET_INCEPTION_STD = (0.5, 0.5, 0.5) # timm.data.IMAGENET_INCEPTION_STD
18
+
19
+ class MiniCPMVPreTrainedModel(LlamaPreTrainedModel):
20
+ config_class = MiniCPMVConfig
21
+
22
+
23
+ class MiniCPMV(MiniCPMVPreTrainedModel):
24
+ def __init__(self, config):
25
+ super().__init__(config)
26
+
27
+ self.llm = LlamaForCausalLM(config)
28
+ self.vpm = self.init_vision_module()
29
+ self.vision_dim = self.vpm.embed_dim
30
+ self.embed_dim = self.llm.config.hidden_size
31
+ self.resampler = self.init_resampler(self.embed_dim, self.vision_dim)
32
+ self.transform = self.init_transform()
33
+
34
+ def init_vision_module(self):
35
+ # same as HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit
36
+ model = Idefics2VisionTransformer(self.config.vision_config)
37
+ if self.config.drop_vision_last_layer:
38
+ model.encoder.layers = model.encoder.layers[:-1]
39
+
40
+ setattr(model, 'embed_dim', model.embeddings.embed_dim)
41
+ setattr(model, 'patch_size', model.embeddings.patch_size)
42
+
43
+ return model
44
+
45
+ def init_resampler(self, embed_dim, vision_dim):
46
+ return Resampler(
47
+ num_queries=self.config.query_num,
48
+ embed_dim=embed_dim,
49
+ num_heads=embed_dim // 128,
50
+ kv_dim=vision_dim,
51
+ adaptive=True
52
+ )
53
+
54
+ def init_transform(self):
55
+ return transforms.Compose(
56
+ [
57
+ transforms.ToTensor(),
58
+ transforms.Normalize(
59
+ mean=IMAGENET_INCEPTION_MEAN, std=IMAGENET_INCEPTION_STD
60
+ ),
61
+ ]
62
+ )
63
+
64
+ def get_input_embeddings(self):
65
+ return self.llm.get_input_embeddings()
66
+
67
+ def set_input_embeddings(self, value):
68
+ self.llm.embed_tokens = value
69
+
70
+ def get_vllm_embedding(self, data):
71
+ if 'vision_hidden_states' not in data:
72
+ dtype = self.vpm.embeddings.position_embedding.weight.dtype
73
+ device = self.vpm.embeddings.position_embedding.weight.device
74
+ tgt_sizes = data['tgt_sizes']
75
+ pixel_values_list = data['pixel_values']
76
+ vision_hidden_states = []
77
+ all_pixel_values = []
78
+ img_cnt = []
79
+ for pixel_values in pixel_values_list:
80
+ img_cnt.append(len(pixel_values))
81
+ all_pixel_values.extend([i.flatten(end_dim=1).permute(1, 0) for i in pixel_values])
82
+
83
+ # exist image
84
+ if all_pixel_values:
85
+ tgt_sizes = torch.vstack(tgt_sizes).type(torch.int32)
86
+
87
+ if self.config.batch_vision_input:
88
+ max_patches = torch.max(tgt_sizes[:, 0] * tgt_sizes[:, 1])
89
+
90
+ all_pixel_values = torch.nn.utils.rnn.pad_sequence(all_pixel_values, batch_first=True,
91
+ padding_value=0.0)
92
+ B, L, _ = all_pixel_values.shape
93
+ all_pixel_values = all_pixel_values.permute(0, 2, 1).reshape(B, 3, -1, L)
94
+
95
+ patch_attn_mask = torch.zeros((B, 1, max_patches), dtype=torch.bool, device=device)
96
+ for i in range(B):
97
+ patch_attn_mask[i, :tgt_sizes[i][0] * tgt_sizes[i][1]] = True
98
+
99
+ vision_embedding = self.vpm(all_pixel_values.type(dtype), patch_attention_mask=patch_attn_mask).last_hidden_state
100
+ vision_embedding = self.resampler(vision_embedding, tgt_sizes)
101
+ else:
102
+ # get vision_embedding foreach
103
+ vision_embedding = []
104
+ for single_tgt_size, single_pixel_values in zip(tgt_sizes, all_pixel_values):
105
+ single_pixel_values = single_pixel_values.unsqueeze(0)
106
+ B, L, _ = single_pixel_values.shape
107
+ single_pixel_values = single_pixel_values.permute(0, 2, 1).reshape(B, 3, -1, L)
108
+ single_vision_embedding = self.vpm(single_pixel_values.type(dtype)).last_hidden_state
109
+ single_vision_embedding = self.resampler(single_vision_embedding, single_tgt_size.unsqueeze(0))
110
+ vision_embedding.append(single_vision_embedding)
111
+ vision_embedding = torch.vstack(vision_embedding)
112
+
113
+ start = 0
114
+ for pixel_values in pixel_values_list:
115
+ img_cnt = len(pixel_values)
116
+ if img_cnt > 0:
117
+ vision_hidden_states.append(vision_embedding[start: start + img_cnt])
118
+ start += img_cnt
119
+ else:
120
+ vision_hidden_states.append([])
121
+ else: # no image
122
+ if self.training:
123
+ dummy_image = torch.zeros(
124
+ (1, 3, 224, 224),
125
+ device=device, dtype=dtype
126
+ )
127
+ tgt_sizes = torch.Tensor([[(224 // self.config.patch_size), math.ceil(224 / self.config.patch_size)]]).type(torch.int32)
128
+ dummy_feature = self.resampler(self.vpm(dummy_image).last_hidden_state, tgt_sizes)
129
+ else:
130
+ dummy_feature = []
131
+ for _ in range(len(pixel_values_list)):
132
+ vision_hidden_states.append(dummy_feature)
133
+
134
+ else:
135
+ vision_hidden_states = data['vision_hidden_states']
136
+
137
+ if hasattr(self.llm.config, 'scale_emb'):
138
+ vllm_embedding = self.llm.model.embed_tokens(data['input_ids']) * self.llm.config.scale_emb
139
+ else:
140
+ vllm_embedding = self.llm.model.embed_tokens(data['input_ids'])
141
+
142
+ vision_hidden_states = [i.type(vllm_embedding.dtype) if isinstance(
143
+ i, torch.Tensor) else i for i in vision_hidden_states]
144
+
145
+ bs = len(data['input_ids'])
146
+ for i in range(bs):
147
+ cur_vs_hs = vision_hidden_states[i]
148
+ if len(cur_vs_hs) > 0:
149
+ cur_vllm_emb = vllm_embedding[i]
150
+ cur_image_bound = data['image_bound'][i]
151
+ if len(cur_image_bound) > 0:
152
+ image_indices = torch.stack(
153
+ [torch.arange(r[0], r[1], dtype=torch.long) for r in cur_image_bound]
154
+ ).to(vllm_embedding.device)
155
+
156
+ cur_vllm_emb.scatter_(0, image_indices.view(-1, 1).repeat(1, cur_vllm_emb.shape[-1]),
157
+ cur_vs_hs.view(-1, cur_vs_hs.shape[-1]))
158
+ elif self.training:
159
+ cur_vllm_emb += cur_vs_hs[0].mean() * 0
160
+
161
+ return vllm_embedding, vision_hidden_states
162
+
163
+ def forward(self, data, **kwargs):
164
+ vllm_embedding, vision_hidden_states = self.get_vllm_embedding(data)
165
+ position_ids = data["position_ids"]
166
+ if position_ids.dtype != torch.int64:
167
+ position_ids = position_ids.long()
168
+
169
+ return self.llm(
170
+ input_ids=None,
171
+ position_ids=position_ids,
172
+ inputs_embeds=vllm_embedding,
173
+ **kwargs
174
+ )
175
+
176
+ def _convert_to_tensors(
177
+ self, tokenizer, input_ids, max_inp_length: Optional[int] = None
178
+ ):
179
+ if max_inp_length is not None:
180
+ input_ids = input_ids[:max_inp_length]
181
+ input_ids = torch.tensor(input_ids, dtype=torch.int32)
182
+
183
+ image_start_tokens = torch.where(input_ids == tokenizer.im_start_id)[0]
184
+ # 跳过 im_start
185
+ image_start_tokens += 1
186
+ image_end_tokens = torch.where(input_ids == tokenizer.im_end_id)[0]
187
+ valid_image_nums = max(len(image_start_tokens), len(image_end_tokens))
188
+ image_bound = torch.hstack(
189
+ [
190
+ image_start_tokens[:valid_image_nums].unsqueeze(-1),
191
+ image_end_tokens[:valid_image_nums].unsqueeze(-1),
192
+ ]
193
+ )
194
+
195
+ model_input = {}
196
+ model_input["input_ids"] = input_ids.unsqueeze(0).to(self.device)
197
+ model_input["image_bound"] = image_bound
198
+
199
+ return model_input
200
+
201
+ def _process_list(
202
+ self, tokenizer, input_id_list, max_inp_length: Optional[int] = None
203
+ ):
204
+ pad_keys = ["input_ids"]
205
+ input_tensors = []
206
+ for input_ids in input_id_list:
207
+ input_tensors.append(
208
+ self._convert_to_tensors(tokenizer, input_ids, max_inp_length)
209
+ )
210
+ padded = {}
211
+ for key in pad_keys:
212
+ padded[key] = pad(input_tensors, key, padding_side="left").to(self.device)
213
+ padded["image_bound"] = [i["image_bound"] for i in input_tensors]
214
+ return padded
215
+
216
+ def _decode(self, inputs_embeds, tokenizer, **kwargs):
217
+ terminators = [
218
+ tokenizer.eos_token_id,
219
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
220
+ ]
221
+ output = self.llm.generate(
222
+ inputs_embeds=inputs_embeds,
223
+ pad_token_id=0,
224
+ eos_token_id=terminators,
225
+ **kwargs
226
+ )
227
+ return self._decode_text(output, tokenizer)
228
+
229
+ def _decode_stream(self, inputs_embeds, tokenizer, **kwargs):
230
+ terminators = [
231
+ tokenizer.eos_token_id,
232
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
233
+ ]
234
+ streamer = TextIteratorStreamer(tokenizer=tokenizer)
235
+ generation_kwargs = {
236
+ 'inputs_embeds': inputs_embeds,
237
+ 'pad_token_id': 0,
238
+ 'eos_token_id': terminators,
239
+ 'streamer': streamer
240
+ }
241
+ generation_kwargs.update(kwargs)
242
+
243
+ thread = Thread(target=self.llm.generate, kwargs=generation_kwargs)
244
+ thread.start()
245
+
246
+ return streamer
247
+
248
+ def _decode_text(self, result_ids, tokenizer):
249
+ result_text = []
250
+ for result in result_ids:
251
+ result = result[result != 0]
252
+ if result[0] == tokenizer.bos_id:
253
+ result = result[1:]
254
+ if result[-1] == tokenizer.eos_id or result[-1] == tokenizer.eot_id:
255
+ result = result[:-1]
256
+ result_text.append(tokenizer.decode(result).strip())
257
+ return result_text
258
+
259
+ def slice_image(self, image):
260
+ return slice_image(
261
+ image,
262
+ self.config.slice_config.max_slice_nums,
263
+ self.config.slice_config.scale_resolution,
264
+ self.config.slice_config.patch_size,
265
+ )
266
+
267
+ def get_slice_image_placeholder(self, image, tokenizer):
268
+ image_placeholder = (
269
+ tokenizer.im_start
270
+ + tokenizer.unk_token * self.config.query_num
271
+ + tokenizer.im_end
272
+ )
273
+
274
+ slice_images = []
275
+
276
+ source_image, patches, best_grid = slice_image(
277
+ image,
278
+ self.config.slice_config.max_slice_nums,
279
+ self.config.slice_config.scale_resolution,
280
+ self.config.slice_config.patch_size,
281
+ )
282
+
283
+ slice_images.append(source_image)
284
+ final_placeholder = image_placeholder
285
+
286
+ if len(patches) > 0:
287
+ for i in range(len(patches)):
288
+ for j in range(len(patches[0])):
289
+ slice_images.append(patches[i][j])
290
+
291
+ final_placeholder += get_grid_placeholder(
292
+ tokenizer, best_grid, self.config.query_num
293
+ )
294
+
295
+ return slice_images, final_placeholder
296
+
297
+ def reshape_by_patch(self, image_tensor):
298
+ """
299
+ :param image_tensor: shape [3, H, W]
300
+ :param patch_size:
301
+ :return: [3, patch_size, HW/patch_size]
302
+ """
303
+ patch_size = self.config.patch_size
304
+ patches = torch.nn.functional.unfold(
305
+ image_tensor,
306
+ (patch_size, patch_size),
307
+ stride=(patch_size, patch_size)
308
+ )
309
+
310
+ patches = patches.reshape(image_tensor.size(0), patch_size, patch_size, -1)
311
+ patches = patches.permute(0, 1, 3, 2).reshape(image_tensor.size(0), patch_size, -1)
312
+ return patches
313
+
314
+ def generate(
315
+ self,
316
+ input_id_list=None,
317
+ img_list=None,
318
+ tgt_sizes=None,
319
+ tokenizer=None,
320
+ max_inp_length: Optional[int] = None,
321
+ vision_hidden_states=None,
322
+ return_vision_hidden_states=False,
323
+ stream=False,
324
+ **kwargs
325
+ ):
326
+
327
+ assert input_id_list is not None
328
+ bs = len(input_id_list)
329
+ if img_list == None:
330
+ img_list = [[] for i in range(bs)]
331
+ assert bs == len(img_list)
332
+
333
+ model_inputs = self._process_list(tokenizer, input_id_list, max_inp_length)
334
+
335
+ if vision_hidden_states is None:
336
+ pixel_values = []
337
+ for i in range(bs):
338
+ img_inps = []
339
+ for img in img_list[i]:
340
+ img_inps.append(img.to(self.device))
341
+ if img_inps:
342
+ pixel_values.append(img_inps)
343
+ else:
344
+ pixel_values.append([])
345
+ model_inputs["pixel_values"] = pixel_values
346
+ model_inputs['tgt_sizes'] = tgt_sizes
347
+ else:
348
+ model_inputs["vision_hidden_states"] = vision_hidden_states
349
+
350
+ with torch.inference_mode():
351
+ (
352
+ model_inputs["inputs_embeds"],
353
+ vision_hidden_states,
354
+ ) = self.get_vllm_embedding(model_inputs)
355
+
356
+ if stream:
357
+ result = self._decode_stream(model_inputs["inputs_embeds"], tokenizer, **kwargs)
358
+ else:
359
+ result = self._decode(model_inputs["inputs_embeds"], tokenizer, **kwargs)
360
+
361
+ if return_vision_hidden_states:
362
+ return result, vision_hidden_states
363
+
364
+ return result
365
+
366
+ def chat(
367
+ self,
368
+ image,
369
+ msgs,
370
+ tokenizer,
371
+ vision_hidden_states=None,
372
+ max_new_tokens=1024,
373
+ sampling=True,
374
+ max_inp_length=2048,
375
+ system_prompt='',
376
+ stream=False,
377
+ **kwargs
378
+ ):
379
+ if isinstance(msgs, str):
380
+ msgs = json.loads(msgs)
381
+
382
+ copy_msgs = deepcopy(msgs)
383
+ assert len(copy_msgs) > 0, 'msgs is empty'
384
+ assert sampling or not stream, 'if use stream mode, make sure sampling=True'
385
+
386
+ if image is not None and isinstance(copy_msgs[0]['content'], str):
387
+ copy_msgs[0]['content'] = [image, copy_msgs[0]['content']]
388
+
389
+ images = []
390
+ tgt_sizes = []
391
+ for i, msg in enumerate(copy_msgs):
392
+ role = msg["role"]
393
+ content = msg["content"]
394
+ assert role in ["user", "assistant"]
395
+ if i == 0:
396
+ assert role == "user", "The role of first msg should be user"
397
+ if isinstance(content, str):
398
+ content = [content]
399
+
400
+ cur_msgs = []
401
+ for c in content:
402
+ if isinstance(c, Image.Image):
403
+ image = c
404
+ if self.config.slice_mode:
405
+ slice_images, image_placeholder = self.get_slice_image_placeholder(
406
+ image, tokenizer
407
+ )
408
+ cur_msgs.append(image_placeholder)
409
+ for slice_image in slice_images:
410
+ slice_image = self.transform(slice_image)
411
+ H, W = slice_image.shape[1:]
412
+ images.append(self.reshape_by_patch(slice_image))
413
+ tgt_sizes.append(torch.Tensor([H // self.config.patch_size, W // self.config.patch_size]).type(torch.int32))
414
+ else:
415
+ images.append(self.transform(image))
416
+ cur_msgs.append(
417
+ tokenizer.im_start
418
+ + tokenizer.unk_token * self.config.query_num
419
+ + tokenizer.im_end
420
+ )
421
+ elif isinstance(c, str):
422
+ cur_msgs.append(c)
423
+
424
+
425
+ msg['content'] = '\n'.join(cur_msgs)
426
+ if tgt_sizes:
427
+ tgt_sizes = torch.vstack(tgt_sizes)
428
+
429
+ if system_prompt:
430
+ sys_msg = {'role': 'system', 'content': system_prompt}
431
+ copy_msgs = [sys_msg] + copy_msgs
432
+
433
+ input_ids = tokenizer.apply_chat_template(copy_msgs, tokenize=True, add_generation_prompt=False)
434
+
435
+ if sampling:
436
+ generation_config = {
437
+ "top_p": 0.8,
438
+ "top_k": 100,
439
+ "temperature": 0.7,
440
+ "do_sample": True,
441
+ "repetition_penalty": 1.05
442
+ }
443
+ else:
444
+ generation_config = {
445
+ "num_beams": 3,
446
+ "repetition_penalty": 1.2,
447
+ }
448
+
449
+ generation_config.update(
450
+ (k, kwargs[k]) for k in generation_config.keys() & kwargs.keys()
451
+ )
452
+
453
+ with torch.inference_mode():
454
+ res, vision_hidden_states = self.generate(
455
+ input_id_list=[input_ids],
456
+ max_inp_length=max_inp_length,
457
+ img_list=[images],
458
+ tgt_sizes=[tgt_sizes],
459
+ tokenizer=tokenizer,
460
+ max_new_tokens=max_new_tokens,
461
+ vision_hidden_states=vision_hidden_states,
462
+ return_vision_hidden_states=True,
463
+ stream=stream,
464
+ **generation_config
465
+ )
466
+
467
+ if stream:
468
+ def stream_gen():
469
+ for text in res:
470
+ text = text.replace(tokenizer.eot_token, '').replace(tokenizer.eos_token, '')
471
+ yield text
472
+ return stream_gen()
473
+
474
+ else:
475
+ answer = res[0]
476
+ return answer
477
+
478
+
479
+ class PreTrainedTokenizerFastWrapper(PreTrainedTokenizerFast):
480
+ def __init__(self, **kwargs):
481
+ super().__init__(**kwargs)
482
+ self.eot_token = "<|eot_id|>"
483
+ self.im_start = "<image>"
484
+ self.im_end = "</image>"
485
+ self.ref_start = "<ref>"
486
+ self.ref_end = "</ref>"
487
+ self.box_start = "<box>"
488
+ self.box_end = "</box>"
489
+ self.quad_start = "<quad>"
490
+ self.quad_end = "</quad>"
491
+ self.slice_start = "<slice>"
492
+ self.slice_end = "</slice>"
493
+
494
+ @property
495
+ def eos_id(self):
496
+ return self.eos_token_id
497
+
498
+ @property
499
+ def bos_id(self):
500
+ return self.bos_token_id
501
+
502
+ @property
503
+ def unk_id(self):
504
+ return self.unk_token_id
505
+
506
+ @property
507
+ def eot_id(self):
508
+ return self.convert_tokens_to_ids(self.eot_token)
509
+
510
+ @property
511
+ def im_start_id(self):
512
+ return self.convert_tokens_to_ids(self.im_start)
513
+
514
+ @property
515
+ def im_end_id(self):
516
+ return self.convert_tokens_to_ids(self.im_end)
517
+
518
+ @staticmethod
519
+ def escape(text: str) -> str:
520
+ return text
521
+
522
+ @staticmethod
523
+ def unescape(text: str) -> str:
524
+ return text
525
+
526
+
527
+ def pad(orig_items, key, max_length=None, padding_value=0, padding_side="left"):
528
+ items = []
529
+ if isinstance(orig_items[0][key], list):
530
+ assert isinstance(orig_items[0][key][0], torch.Tensor)
531
+ for it in orig_items:
532
+ for tr in it[key]:
533
+ items.append({key: tr})
534
+ else:
535
+ assert isinstance(orig_items[0][key], torch.Tensor)
536
+ items = orig_items
537
+
538
+ batch_size = len(items)
539
+ shape = items[0][key].shape
540
+ dim = len(shape)
541
+ assert dim <= 3
542
+ if max_length is None:
543
+ max_length = 0
544
+ max_length = max(max_length, max(item[key].shape[-1] for item in items))
545
+ min_length = min(item[key].shape[-1] for item in items)
546
+ dtype = items[0][key].dtype
547
+
548
+ if dim == 1:
549
+ return torch.cat([item[key] for item in items], dim=0)
550
+ elif dim == 2:
551
+ if max_length == min_length:
552
+ return torch.cat([item[key] for item in items], dim=0)
553
+ tensor = torch.zeros((batch_size, max_length), dtype=dtype) + padding_value
554
+ else:
555
+ tensor = (
556
+ torch.zeros((batch_size, max_length, shape[-1]), dtype=dtype)
557
+ + padding_value
558
+ )
559
+
560
+ for i, item in enumerate(items):
561
+ if dim == 2:
562
+ if padding_side == "left":
563
+ tensor[i, -len(item[key][0]) :] = item[key][0].clone()
564
+ else:
565
+ tensor[i, : len(item[key][0])] = item[key][0].clone()
566
+ elif dim == 3:
567
+ if padding_side == "left":
568
+ tensor[i, -len(item[key][0]) :, :] = item[key][0].clone()
569
+ else:
570
+ tensor[i, : len(item[key][0]), :] = item[key][0].clone()
571
+
572
+ return tensor
573
+
574
+
575
+ def slice_image(
576
+ image, max_slice_nums=9, scale_resolution=448, patch_size=14, never_split=False
577
+ ):
578
+ original_size = image.size
579
+ original_width, original_height = original_size
580
+ log_ratio = math.log(original_width / original_height)
581
+ ratio = original_width * original_height / (scale_resolution * scale_resolution)
582
+ multiple = min(math.ceil(ratio), max_slice_nums)
583
+
584
+ source_image = None
585
+ best_grid = None
586
+ patches = []
587
+
588
+ if multiple <= 1 or never_split:
589
+ # dont need to slice, upsample
590
+ best_size = find_best_resize(
591
+ original_size, scale_resolution, patch_size, allow_upscale=True
592
+ )
593
+ source_image = image.resize(best_size, Image.Resampling.BICUBIC)
594
+ else:
595
+ candidate_split_grids_nums = []
596
+ for i in [multiple - 1, multiple, multiple + 1]:
597
+ if i == 1 or i > max_slice_nums:
598
+ continue
599
+ candidate_split_grids_nums.append(i)
600
+
601
+ # source image, down-sampling and ensure divided by patch_size
602
+ best_resize = find_best_resize(original_size, scale_resolution, patch_size)
603
+ source_image = image.copy().resize(best_resize, Image.Resampling.BICUBIC)
604
+ candidate_grids = []
605
+
606
+ # find best grid
607
+ for split_grids_nums in candidate_split_grids_nums:
608
+ m = 1
609
+ while m <= split_grids_nums:
610
+ if split_grids_nums % m == 0:
611
+ candidate_grids.append([m, split_grids_nums // m])
612
+ m += 1
613
+
614
+ best_grid = [1, 1]
615
+ min_error = float("inf")
616
+ for grid in candidate_grids:
617
+ error = abs(log_ratio - math.log(grid[0] / grid[1]))
618
+ if error < min_error:
619
+ best_grid = grid
620
+ min_error = error
621
+
622
+ refine_size = get_refine_size(
623
+ original_size, best_grid, scale_resolution, patch_size, allow_upscale=True
624
+ )
625
+
626
+ refine_image = image.resize(refine_size, Image.Resampling.BICUBIC)
627
+ patches = split_to_patches(refine_image, best_grid)
628
+
629
+ return source_image, patches, best_grid
630
+
631
+
632
+ def ensure_divide(length, patch_size):
633
+ return max(round(length / patch_size) * patch_size, patch_size)
634
+
635
+
636
+ def find_best_resize(original_size, scale_resolution, patch_size, allow_upscale=False):
637
+ width, height = original_size
638
+ if (width * height > scale_resolution * scale_resolution) or allow_upscale:
639
+ r = width / height
640
+ height = int(scale_resolution / math.sqrt(r))
641
+ width = int(height * r)
642
+ best_width = ensure_divide(width, patch_size)
643
+ best_height = ensure_divide(height, patch_size)
644
+ return (best_width, best_height)
645
+
646
+
647
+ def get_refine_size(
648
+ original_size, grid, scale_resolution, patch_size, allow_upscale=False
649
+ ):
650
+ width, height = original_size
651
+ grid_x, grid_y = grid
652
+
653
+ refine_width = ensure_divide(width, grid_x)
654
+ refine_height = ensure_divide(height, grid_y)
655
+
656
+ grid_width = refine_width / grid_x
657
+ grid_height = refine_height / grid_y
658
+
659
+ best_grid_size = find_best_resize(
660
+ (grid_width, grid_height),
661
+ scale_resolution,
662
+ patch_size,
663
+ allow_upscale=allow_upscale,
664
+ )
665
+
666
+ refine_size = (best_grid_size[0] * grid_x, best_grid_size[1] * grid_y)
667
+
668
+ return refine_size
669
+
670
+
671
+ def split_to_patches(image, grid):
672
+ patches = []
673
+ width, height = image.size
674
+ grid_x = int(width / grid[0])
675
+ grid_y = int(height / grid[1])
676
+
677
+ for i in range(0, height, grid_y):
678
+ images = []
679
+ for j in range(0, width, grid_x):
680
+ box = (j, i, j + grid_x, i + grid_y)
681
+ patch = image.crop(box)
682
+ images.append(patch)
683
+ patches.append(images)
684
+
685
+ return patches
686
+
687
+
688
+ def get_grid_placeholder(tokenizer, grid, query_num):
689
+ image_placeholder = (
690
+ tokenizer.im_start + tokenizer.unk_token * query_num + tokenizer.im_end
691
+ )
692
+
693
+ cols = grid[0]
694
+ rows = grid[1]
695
+ slices = []
696
+ for i in range(rows):
697
+ lines = []
698
+ for j in range(cols):
699
+ lines.append(image_placeholder)
700
+ slices.append("".join(lines))
701
+ slice_placeholder = tokenizer.slice_start + "\n".join(slices) + tokenizer.slice_end
702
+ return slice_placeholder
resampler.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from functools import partial
2
+ import numpy as np
3
+
4
+ import torch
5
+ from torch import nn
6
+ from torch.nn.init import trunc_normal_
7
+
8
+ def get_2d_sincos_pos_embed(embed_dim, image_size):
9
+ """
10
+ image_size: image_size or (image_height, image_width)
11
+ return:
12
+ pos_embed: [image_height, image_width, embed_dim]
13
+ """
14
+ if isinstance(image_size, int):
15
+ grid_h_size, grid_w_size = image_size, image_size
16
+ else:
17
+ grid_h_size, grid_w_size = image_size[0], image_size[1]
18
+
19
+ grid_h = np.arange(grid_h_size, dtype=np.float32)
20
+ grid_w = np.arange(grid_w_size, dtype=np.float32)
21
+ grid = np.meshgrid(grid_w, grid_h) # here w goes first
22
+ grid = np.stack(grid, axis=0)
23
+
24
+ pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)
25
+ return pos_embed
26
+
27
+
28
+ def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
29
+ assert embed_dim % 2 == 0
30
+
31
+ # use half of dimensions to encode grid_h
32
+ emb_h = get_1d_sincos_pos_embed_from_grid_new(embed_dim // 2, grid[0]) # (H, W, D/2)
33
+ emb_w = get_1d_sincos_pos_embed_from_grid_new(embed_dim // 2, grid[1]) # (H, W, D/2)
34
+
35
+ emb = np.concatenate([emb_h, emb_w], axis=-1) # (H, W, D)
36
+ return emb
37
+
38
+
39
+ def get_1d_sincos_pos_embed_from_grid_new(embed_dim, pos):
40
+ """
41
+ embed_dim: output dimension for each position
42
+ pos: a list of positions to be encoded: size (H, W)
43
+ out: (H, W, D)
44
+ """
45
+ assert embed_dim % 2 == 0
46
+ omega = np.arange(embed_dim // 2, dtype=np.float32)
47
+ omega /= embed_dim / 2.
48
+ omega = 1. / 10000 ** omega # (D/2,)
49
+
50
+ out = np.einsum('hw,d->hwd', pos, omega) # (H, W, D/2), outer product
51
+
52
+ emb_sin = np.sin(out) # (H, W, D/2)
53
+ emb_cos = np.cos(out) # (H, W, D/2)
54
+
55
+ emb = np.concatenate([emb_sin, emb_cos], axis=-1) # (H, W, D)
56
+ return emb
57
+
58
+
59
+ class Resampler(nn.Module):
60
+ """
61
+ A 2D perceiver-resampler network with one cross attention layers by
62
+ given learnable queries and 2d sincos pos_emb
63
+ Outputs:
64
+ A tensor with the shape of (batch_size, num_queries, embed_dim)
65
+ """
66
+
67
+ def __init__(
68
+ self,
69
+ num_queries,
70
+ embed_dim,
71
+ num_heads,
72
+ kv_dim=None,
73
+ norm_layer=partial(nn.LayerNorm, eps=1e-6),
74
+ adaptive=False,
75
+ max_size=(70, 70),
76
+ ):
77
+ super().__init__()
78
+ self.num_queries = num_queries
79
+ self.embed_dim = embed_dim
80
+ self.num_heads = num_heads
81
+ self.adaptive = adaptive
82
+ self.max_size = max_size
83
+
84
+ self.query = nn.Parameter(torch.zeros(self.num_queries, embed_dim))
85
+ trunc_normal_(self.query, std=.02)
86
+
87
+ if kv_dim is not None and kv_dim != embed_dim:
88
+ self.kv_proj = nn.Linear(kv_dim, embed_dim, bias=False)
89
+ else:
90
+ self.kv_proj = nn.Identity()
91
+
92
+ self.attn = nn.MultiheadAttention(embed_dim, num_heads)
93
+ self.ln_q = norm_layer(embed_dim)
94
+ self.ln_kv = norm_layer(embed_dim)
95
+
96
+ self.ln_post = norm_layer(embed_dim)
97
+ self.proj = nn.Parameter((embed_dim ** -0.5) * torch.randn(embed_dim, embed_dim))
98
+
99
+ self._set_2d_pos_cache(self.max_size)
100
+ self.apply(self._init_weights)
101
+
102
+ def _set_2d_pos_cache(self, max_size, device='cpu'):
103
+ pos_embed = torch.from_numpy(get_2d_sincos_pos_embed(self.embed_dim, max_size)).float().to(device)
104
+ self.register_buffer("pos_embed", pos_embed, persistent=False)
105
+
106
+ def _adjust_pos_cache(self, tgt_sizes, device):
107
+ max_h = torch.max(tgt_sizes[:, 0])
108
+ max_w = torch.max(tgt_sizes[:, 1])
109
+ if max_h > self.max_size[0] or max_w > self.max_size[1]:
110
+ self.max_size = [max(max_h, self.max_size[0]), max(max_w, self.max_size[1])]
111
+ self._set_2d_pos_cache(self.max_size, device)
112
+
113
+ def _init_weights(self, m):
114
+ if isinstance(m, nn.Linear):
115
+ trunc_normal_(m.weight, std=.02)
116
+ if isinstance(m, nn.Linear) and m.bias is not None:
117
+ nn.init.constant_(m.bias, 0)
118
+ elif isinstance(m, nn.LayerNorm):
119
+ nn.init.constant_(m.bias, 0)
120
+ nn.init.constant_(m.weight, 1.0)
121
+
122
+ def forward(self, x, tgt_sizes=None):
123
+ assert x.shape[0] == tgt_sizes.shape[0]
124
+ bs = x.shape[0]
125
+
126
+ device = x.device
127
+ dtype = x.dtype
128
+
129
+ patch_len = tgt_sizes[:, 0] * tgt_sizes[:, 1]
130
+
131
+ self._adjust_pos_cache(tgt_sizes, device=device)
132
+
133
+ max_patch_len = torch.max(patch_len)
134
+ key_padding_mask = torch.zeros((bs, max_patch_len), dtype=torch.bool, device=device)
135
+
136
+ pos_embed = []
137
+ for i in range(bs):
138
+ tgt_h, tgt_w = tgt_sizes[i]
139
+ pos_embed.append(self.pos_embed[:tgt_h, :tgt_w, :].reshape((tgt_h * tgt_w, -1)).to(dtype)) # patches * D
140
+ key_padding_mask[i, patch_len[i]:] = True
141
+
142
+ pos_embed = torch.nn.utils.rnn.pad_sequence(
143
+ pos_embed, batch_first=True, padding_value=0.0).permute(1, 0, 2) # BLD => L * B * D
144
+
145
+ x = self.kv_proj(x) # B * L * D
146
+ x = self.ln_kv(x).permute(1, 0, 2) # L * B * D
147
+
148
+ q = self.ln_q(self.query) # Q * D
149
+
150
+ out = self.attn(
151
+ self._repeat(q, bs), # Q * B * D
152
+ x + pos_embed, # L * B * D + L * B * D
153
+ x,
154
+ key_padding_mask=key_padding_mask)[0]
155
+ # out: Q * B * D
156
+ x = out.permute(1, 0, 2) # B * Q * D
157
+
158
+ x = self.ln_post(x)
159
+ x = x @ self.proj
160
+ return x
161
+
162
+ def _repeat(self, query, N: int):
163
+ return query.unsqueeze(1).repeat(1, N, 1)
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,2072 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|reserved_special_token_2|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_3|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|reserved_special_token_4|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<image>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "</image>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<ref>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "</ref>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<box>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "</box>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<quad>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "</quad>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<point>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "</point>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<slice>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "</slice>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_17|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_18|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_19|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_20|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_21|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_22|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_23|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_24|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_25|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_26|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_27|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_28|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_29|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_30|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_31|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_32|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_33|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_34|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_35|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_36|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_37|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_38|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_39|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_40|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_41|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_42|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_43|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_44|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_45|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_46|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_47|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_48|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_49|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_50|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_51|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_52|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_53|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_54|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_55|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_56|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_57|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_58|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_59|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_60|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_61|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_62|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_63|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_64|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_65|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_66|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_67|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_68|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_69|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_70|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_71|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_72|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_73|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_74|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_75|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_76|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_77|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_78|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_79|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_80|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_81|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_82|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_83|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_84|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_85|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_86|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_87|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_88|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_89|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_90|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_91|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_92|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_93|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_94|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_95|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_96|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_97|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_98|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_99|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_100|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_101|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_102|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_103|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_104|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_105|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_106|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_107|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_108|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_109|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_110|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_111|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_112|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_113|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_114|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_115|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_116|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_117|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_118|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_119|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_120|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_121|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_122|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_123|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_124|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_125|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_126|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_127|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_128|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_129|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_130|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_131|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_132|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_133|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_134|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_135|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_136|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_137|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_138|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_139|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_140|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_141|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_142|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_143|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_144|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_145|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_146|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_147|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_148|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_149|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_150|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_151|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_152|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_153|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_154|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_155|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_156|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_157|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_158|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_159|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_160|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_161|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_162|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_163|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_164|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_165|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_166|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_167|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_168|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_169|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_170|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_171|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_172|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_173|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_174|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_175|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_176|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_177|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_178|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_179|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_180|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_181|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_182|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_183|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_184|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_185|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_186|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_187|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_188|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_189|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_190|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_191|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_192|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_193|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_194|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_195|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_196|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_197|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_198|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_199|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_200|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_201|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_202|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_203|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_204|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_205|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_206|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_207|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_208|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_209|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_210|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_211|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_212|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_213|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_214|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_215|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_216|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_217|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_218|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_219|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_220|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_221|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_222|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_223|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_224|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_225|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_226|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_227|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_228|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_229|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_230|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_231|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_232|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_233|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_234|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_235|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_236|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_237|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_238|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_239|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_240|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_241|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_242|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_243|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_244|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_245|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_246|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_247|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_248|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_249|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_250|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "auto_map": {
2053
+ "AutoTokenizer": [
2054
+ "modeling_minicpmv.PreTrainedTokenizerFastWrapper",
2055
+ null
2056
+ ]
2057
+ },
2058
+ "bos_token": "<|begin_of_text|>",
2059
+ "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
2060
+ "clean_up_tokenization_spaces": true,
2061
+ "eos_token": "<|end_of_text|>",
2062
+ "model_input_names": [
2063
+ "input_ids",
2064
+ "attention_mask"
2065
+ ],
2066
+ "model_max_length": 1000000000000000019884624838656,
2067
+ "pad_token": "!",
2068
+ "padding_side": "right",
2069
+ "tokenizer_class": "PreTrainedTokenizerFastWrapper",
2070
+ "truncation_side": "right",
2071
+ "unk_token": "<unk>"
2072
+ }