doyoungkim commited on
Commit
d308a85
·
verified ·
1 Parent(s): dedb1d4

Add <action> token and update config

Browse files
README.md ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - HuggingFaceM4/the_cauldron
6
+ - HuggingFaceM4/Docmatix
7
+ - lmms-lab/LLaVA-OneVision-Data
8
+ - lmms-lab/M4-Instruct-Data
9
+ - HuggingFaceFV/finevideo
10
+ - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
11
+ - lmms-lab/LLaVA-Video-178K
12
+ - orrzohar/Video-STaR
13
+ - Mutonix/Vript
14
+ - TIGER-Lab/VISTA-400K
15
+ - Enxin/MovieChat-1K_train
16
+ - ShareGPT4Video/ShareGPT4Video
17
+ pipeline_tag: image-text-to-text
18
+ language:
19
+ - en
20
+ base_model:
21
+ - HuggingFaceTB/SmolVLM-500M-Instruct
22
+ ---
23
+
24
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/SmolVLM2_banner.png" width="800" height="auto" alt="Image description">
25
+
26
+ # SmolVLM2-500M-Video
27
+
28
+ SmolVLM2-500M-Video is a lightweight multimodal model designed to analyze video content. The model processes videos, images, and text inputs to generate text outputs - whether answering questions about media files, comparing visual content, or transcribing text from images. Despite its compact size, requiring only 1.8GB of GPU RAM for video inference, it delivers robust performance on complex multimodal tasks. This efficiency makes it particularly well-suited for on-device applications where computational resources may be limited.
29
+ ## Model Summary
30
+
31
+ - **Developed by:** Hugging Face 🤗
32
+ - **Model type:** Multi-modal model (image/multi-image/video/text)
33
+ - **Language(s) (NLP):** English
34
+ - **License:** Apache 2.0
35
+ - **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
36
+
37
+ ## Resources
38
+
39
+ - **Demo:** [Video Highlight Generator](https://huggingface.co/spaces/HuggingFaceTB/SmolVLM2-HighlightGenerator)
40
+ - **Blog:** [Blog post](https://huggingface.co/blog/smolvlm2)
41
+
42
+ ## Uses
43
+
44
+ SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation.
45
+
46
+ To fine-tune SmolVLM2 on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
47
+
48
+ ## Evaluation
49
+
50
+ We evaluated the performance of the SmolVLM2 family on the following scientific benchmarks:
51
+
52
+ | Size | Video-MME | MLVU | MVBench |
53
+ |----------|-----------------|----------|---------------|
54
+ | 2.2B | 52.1 | 55.2 | 46.27 |
55
+ | 500M | 42.2 | 47.3 | 39.73 |
56
+ | 256M | 33.7 | 40.6 | 32.7 |
57
+
58
+
59
+ ### How to get started
60
+
61
+ You can use transformers to load, infer and fine-tune SmolVLM. Make sure you have num2words, flash-attn and latest transformers installed.
62
+ You can load the model as follows.
63
+
64
+ ```python
65
+ from transformers import AutoProcessor, AutoModelForImageTextToText
66
+ import torch
67
+
68
+ model_path = "HuggingFaceTB/SmolVLM2-500M-Video-Instruct"
69
+ processor = AutoProcessor.from_pretrained(model_path)
70
+ model = AutoModelForImageTextToText.from_pretrained(
71
+ model_path,
72
+ torch_dtype=torch.bfloat16,
73
+ _attn_implementation="flash_attention_2"
74
+ ).to("cuda")
75
+ ```
76
+
77
+ #### Simple Inference
78
+
79
+ You preprocess your inputs directly using chat templates and directly passing them
80
+
81
+ ```python
82
+ messages = [
83
+ {
84
+ "role": "user",
85
+ "content": [
86
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
87
+ {"type": "text", "text": "Can you describe this image?"},
88
+ ]
89
+ },
90
+ ]
91
+
92
+ inputs = processor.apply_chat_template(
93
+ messages,
94
+ add_generation_prompt=True,
95
+ tokenize=True,
96
+ return_dict=True,
97
+ return_tensors="pt",
98
+ ).to(model.device, dtype=torch.bfloat16)
99
+
100
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
101
+ generated_texts = processor.batch_decode(
102
+ generated_ids,
103
+ skip_special_tokens=True,
104
+ )
105
+ print(generated_texts[0])
106
+ ```
107
+
108
+ #### Video Inference
109
+
110
+ To use SmolVLM2 for video inference, make sure you have decord installed.
111
+
112
+ ```python
113
+ messages = [
114
+ {
115
+ "role": "user",
116
+ "content": [
117
+ {"type": "video", "path": "path_to_video.mp4"},
118
+ {"type": "text", "text": "Describe this video in detail"}
119
+ ]
120
+ },
121
+ ]
122
+
123
+ inputs = processor.apply_chat_template(
124
+ messages,
125
+ add_generation_prompt=True,
126
+ tokenize=True,
127
+ return_dict=True,
128
+ return_tensors="pt",
129
+ ).to(model.device, dtype=torch.bfloat16)
130
+
131
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
132
+ generated_texts = processor.batch_decode(
133
+ generated_ids,
134
+ skip_special_tokens=True,
135
+ )
136
+
137
+ print(generated_texts[0])
138
+ ```
139
+ #### Multi-image Interleaved Inference
140
+
141
+ You can interleave multiple media with text using chat templates.
142
+
143
+ ```python
144
+ import torch
145
+
146
+
147
+ messages = [
148
+ {
149
+ "role": "user",
150
+ "content": [
151
+ {"type": "text", "text": "What is the similarity between these two images?"},
152
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
153
+ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"},
154
+ ]
155
+ },
156
+ ]
157
+
158
+ inputs = processor.apply_chat_template(
159
+ messages,
160
+ add_generation_prompt=True,
161
+ tokenize=True,
162
+ return_dict=True,
163
+ return_tensors="pt",
164
+ ).to(model.device, dtype=torch.bfloat16)
165
+
166
+ generated_ids = model.generate(**inputs, do_sample=False, max_new_tokens=64)
167
+ generated_texts = processor.batch_decode(
168
+ generated_ids,
169
+ skip_special_tokens=True,
170
+ )
171
+ print(generated_texts[0])
172
+ ```
173
+
174
+
175
+ ### Model optimizations
176
+
177
+ ## Misuse and Out-of-scope Use
178
+
179
+ SmolVLM is not intended for high-stakes scenarios or critical decision-making processes that affect an individual's well-being or livelihood. The model may produce content that appears factual but may not be accurate. Misuse includes, but is not limited to:
180
+
181
+ - Prohibited Uses:
182
+ - Evaluating or scoring individuals (e.g., in employment, education, credit)
183
+ - Critical automated decision-making
184
+ - Generating unreliable factual content
185
+ - Malicious Activities:
186
+ - Spam generation
187
+ - Disinformation campaigns
188
+ - Harassment or abuse
189
+ - Unauthorized surveillance
190
+
191
+ ### License
192
+
193
+ SmolVLM2 is built upon [SigLIP](https://huggingface.co/google/siglip-base-patch16-512) as image encoder and [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) for text decoder part.
194
+
195
+ We release the SmolVLM2 checkpoints under the Apache 2.0 license.
196
+
197
+ ## Citation information
198
+ You can cite us in the following way:
199
+ ```bibtex
200
+ @article{marafioti2025smolvlm,
201
+ title={SmolVLM: Redefining small and efficient multimodal models},
202
+ author={Andrés Marafioti and Orr Zohar and Miquel Farré and Merve Noyan and Elie Bakouch and Pedro Cuenca and Cyril Zakka and Loubna Ben Allal and Anton Lozhkov and Nouamane Tazi and Vaibhav Srivastav and Joshua Lochner and Hugo Larcher and Mathieu Morlon and Lewis Tunstall and Leandro von Werra and Thomas Wolf},
203
+ journal={arXiv preprint arXiv:2504.05299},
204
+ year={2025}
205
+ }
206
+ ```
207
+
208
+ ## Training Data
209
+ SmolVLM2 used 3.3M samples for training originally from ten different datasets: [LlaVa Onevision](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data), [M4-Instruct](https://huggingface.co/datasets/lmms-lab/M4-Instruct-Data), [Mammoth](https://huggingface.co/datasets/MAmmoTH-VL/MAmmoTH-VL-Instruct-12M), [LlaVa Video 178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K), [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo), [VideoStar](https://huggingface.co/datasets/orrzohar/Video-STaR), [VRipt](https://huggingface.co/datasets/Mutonix/Vript), [Vista-400K](https://huggingface.co/datasets/TIGER-Lab/VISTA-400K), [MovieChat](https://huggingface.co/datasets/Enxin/MovieChat-1K_train) and [ShareGPT4Video](https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video).
210
+ In the following plots we give a general overview of the samples across modalities and the source of those samples.
211
+ <!--
212
+ <center><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_data_split.png" width="auto" height="auto" alt="Image description">
213
+ </center>
214
+
215
+ ### Details
216
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolvlm2_datadetails.png" width="auto" height="auto" alt="Image description"> -->
217
+
218
+ ## Data Split per modality
219
+
220
+ | Data Type | Percentage |
221
+ |--------------|------------|
222
+ | Image | 34.4% |
223
+ | Text | 20.2% |
224
+ | Video | 33.0% |
225
+ | Multi-image | 12.3% |
226
+
227
+
228
+ ## Granular dataset slices per modality
229
+
230
+ ### Text Datasets
231
+ | Dataset | Percentage |
232
+ |--------------------------------------------|------------|
233
+ | llava-onevision/magpie_pro_ft3_80b_mt | 6.8% |
234
+ | llava-onevision/magpie_pro_ft3_80b_tt | 6.8% |
235
+ | llava-onevision/magpie_pro_qwen2_72b_tt | 5.8% |
236
+ | llava-onevision/mathqa | 0.9% |
237
+
238
+ ### Multi-image Datasets
239
+ | Dataset | Percentage |
240
+ |--------------------------------------------|------------|
241
+ | m4-instruct-data/m4_instruct_multiimage | 10.4% |
242
+ | mammoth/multiimage-cap6 | 1.9% |
243
+
244
+ ### Image Datasets
245
+ | Dataset | Percentage |
246
+ |--------------------------------------------|------------|
247
+ | llava-onevision/other | 17.4% |
248
+ | llava-onevision/vision_flan | 3.9% |
249
+ | llava-onevision/mavis_math_metagen | 2.6% |
250
+ | llava-onevision/mavis_math_rule_geo | 2.5% |
251
+ | llava-onevision/sharegpt4o | 1.7% |
252
+ | llava-onevision/sharegpt4v_coco | 1.5% |
253
+ | llava-onevision/image_textualization | 1.3% |
254
+ | llava-onevision/sharegpt4v_llava | 0.9% |
255
+ | llava-onevision/mapqa | 0.9% |
256
+ | llava-onevision/qa | 0.8% |
257
+ | llava-onevision/textocr | 0.8% |
258
+
259
+ ### Video Datasets
260
+ | Dataset | Percentage |
261
+ |--------------------------------------------|------------|
262
+ | llava-video-178k/1-2m | 7.3% |
263
+ | llava-video-178k/2-3m | 7.0% |
264
+ | other-video/combined | 5.7% |
265
+ | llava-video-178k/hound | 4.4% |
266
+ | llava-video-178k/0-30s | 2.4% |
267
+ | video-star/starb | 2.2% |
268
+ | vista-400k/combined | 2.2% |
269
+ | vript/long | 1.0% |
270
+ | ShareGPT4Video/all | 0.8% |
added_tokens.json ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<action>": 49280,
3
+ "<end_of_utterance>": 49279,
4
+ "<fake_token_around_image>": 49189,
5
+ "<global-img>": 49152,
6
+ "<image>": 49190,
7
+ "<row_1_col_1>": 49153,
8
+ "<row_1_col_2>": 49154,
9
+ "<row_1_col_3>": 49155,
10
+ "<row_1_col_4>": 49156,
11
+ "<row_1_col_5>": 49157,
12
+ "<row_1_col_6>": 49158,
13
+ "<row_2_col_1>": 49159,
14
+ "<row_2_col_2>": 49160,
15
+ "<row_2_col_3>": 49161,
16
+ "<row_2_col_4>": 49162,
17
+ "<row_2_col_5>": 49163,
18
+ "<row_2_col_6>": 49164,
19
+ "<row_3_col_1>": 49165,
20
+ "<row_3_col_2>": 49166,
21
+ "<row_3_col_3>": 49167,
22
+ "<row_3_col_4>": 49168,
23
+ "<row_3_col_5>": 49169,
24
+ "<row_3_col_6>": 49170,
25
+ "<row_4_col_1>": 49171,
26
+ "<row_4_col_2>": 49172,
27
+ "<row_4_col_3>": 49173,
28
+ "<row_4_col_4>": 49174,
29
+ "<row_4_col_5>": 49175,
30
+ "<row_4_col_6>": 49176,
31
+ "<row_5_col_1>": 49177,
32
+ "<row_5_col_2>": 49178,
33
+ "<row_5_col_3>": 49179,
34
+ "<row_5_col_4>": 49180,
35
+ "<row_5_col_5>": 49181,
36
+ "<row_5_col_6>": 49182,
37
+ "<row_6_col_1>": 49183,
38
+ "<row_6_col_2>": 49184,
39
+ "<row_6_col_3>": 49185,
40
+ "<row_6_col_4>": 49186,
41
+ "<row_6_col_5>": 49187,
42
+ "<row_6_col_6>": 49188,
43
+ "<|reserved_special_token_0|>": 49191,
44
+ "<|reserved_special_token_10|>": 49201,
45
+ "<|reserved_special_token_11|>": 49202,
46
+ "<|reserved_special_token_12|>": 49203,
47
+ "<|reserved_special_token_13|>": 49204,
48
+ "<|reserved_special_token_14|>": 49205,
49
+ "<|reserved_special_token_15|>": 49206,
50
+ "<|reserved_special_token_16|>": 49207,
51
+ "<|reserved_special_token_17|>": 49208,
52
+ "<|reserved_special_token_18|>": 49209,
53
+ "<|reserved_special_token_19|>": 49210,
54
+ "<|reserved_special_token_1|>": 49192,
55
+ "<|reserved_special_token_20|>": 49211,
56
+ "<|reserved_special_token_21|>": 49212,
57
+ "<|reserved_special_token_22|>": 49213,
58
+ "<|reserved_special_token_23|>": 49214,
59
+ "<|reserved_special_token_24|>": 49215,
60
+ "<|reserved_special_token_25|>": 49216,
61
+ "<|reserved_special_token_26|>": 49217,
62
+ "<|reserved_special_token_27|>": 49218,
63
+ "<|reserved_special_token_28|>": 49219,
64
+ "<|reserved_special_token_29|>": 49220,
65
+ "<|reserved_special_token_2|>": 49193,
66
+ "<|reserved_special_token_30|>": 49221,
67
+ "<|reserved_special_token_31|>": 49222,
68
+ "<|reserved_special_token_32|>": 49223,
69
+ "<|reserved_special_token_33|>": 49224,
70
+ "<|reserved_special_token_34|>": 49225,
71
+ "<|reserved_special_token_35|>": 49226,
72
+ "<|reserved_special_token_36|>": 49227,
73
+ "<|reserved_special_token_37|>": 49228,
74
+ "<|reserved_special_token_38|>": 49229,
75
+ "<|reserved_special_token_39|>": 49230,
76
+ "<|reserved_special_token_3|>": 49194,
77
+ "<|reserved_special_token_40|>": 49231,
78
+ "<|reserved_special_token_41|>": 49232,
79
+ "<|reserved_special_token_42|>": 49233,
80
+ "<|reserved_special_token_43|>": 49234,
81
+ "<|reserved_special_token_44|>": 49235,
82
+ "<|reserved_special_token_45|>": 49236,
83
+ "<|reserved_special_token_46|>": 49237,
84
+ "<|reserved_special_token_47|>": 49238,
85
+ "<|reserved_special_token_48|>": 49239,
86
+ "<|reserved_special_token_49|>": 49240,
87
+ "<|reserved_special_token_4|>": 49195,
88
+ "<|reserved_special_token_50|>": 49241,
89
+ "<|reserved_special_token_51|>": 49242,
90
+ "<|reserved_special_token_52|>": 49243,
91
+ "<|reserved_special_token_53|>": 49244,
92
+ "<|reserved_special_token_54|>": 49245,
93
+ "<|reserved_special_token_55|>": 49246,
94
+ "<|reserved_special_token_56|>": 49247,
95
+ "<|reserved_special_token_57|>": 49248,
96
+ "<|reserved_special_token_58|>": 49249,
97
+ "<|reserved_special_token_59|>": 49250,
98
+ "<|reserved_special_token_5|>": 49196,
99
+ "<|reserved_special_token_60|>": 49251,
100
+ "<|reserved_special_token_61|>": 49252,
101
+ "<|reserved_special_token_62|>": 49253,
102
+ "<|reserved_special_token_63|>": 49254,
103
+ "<|reserved_special_token_64|>": 49255,
104
+ "<|reserved_special_token_65|>": 49256,
105
+ "<|reserved_special_token_66|>": 49257,
106
+ "<|reserved_special_token_67|>": 49258,
107
+ "<|reserved_special_token_68|>": 49259,
108
+ "<|reserved_special_token_69|>": 49260,
109
+ "<|reserved_special_token_6|>": 49197,
110
+ "<|reserved_special_token_70|>": 49261,
111
+ "<|reserved_special_token_71|>": 49262,
112
+ "<|reserved_special_token_72|>": 49263,
113
+ "<|reserved_special_token_73|>": 49264,
114
+ "<|reserved_special_token_74|>": 49265,
115
+ "<|reserved_special_token_75|>": 49266,
116
+ "<|reserved_special_token_76|>": 49267,
117
+ "<|reserved_special_token_77|>": 49268,
118
+ "<|reserved_special_token_78|>": 49269,
119
+ "<|reserved_special_token_79|>": 49270,
120
+ "<|reserved_special_token_7|>": 49198,
121
+ "<|reserved_special_token_80|>": 49271,
122
+ "<|reserved_special_token_81|>": 49272,
123
+ "<|reserved_special_token_82|>": 49273,
124
+ "<|reserved_special_token_83|>": 49274,
125
+ "<|reserved_special_token_84|>": 49275,
126
+ "<|reserved_special_token_85|>": 49276,
127
+ "<|reserved_special_token_86|>": 49277,
128
+ "<|reserved_special_token_87|>": 49278,
129
+ "<|reserved_special_token_8|>": 49199,
130
+ "<|reserved_special_token_9|>": 49200
131
+ }
chat_template.jinja ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ <|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '<image>' }}{% endif %}{% endfor %}<end_of_utterance>
2
+ {% endfor %}{% if add_generation_prompt %}{{ 'Assistant: ' }}{% endif %}
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "<|im_start|>{% for message in messages %}{{message['role'] | capitalize}}{% if message['content'][0]['type'] == 'image' %}{{':'}}{% else %}{{': '}}{% endif %}{% for line in message['content'] %}{% if line['type'] == 'text' %}{{line['text']}}{% elif line['type'] == 'image' %}{{ '<image>' }}{% endif %}{% endfor %}<end_of_utterance>\n{% endfor %}{% if add_generation_prompt %}{{ 'Assistant: ' }}{% endif %}"
3
+ }
config.json ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SmolVLMForConditionalGeneration"
4
+ ],
5
+ "image_token_id": 49190,
6
+ "model_type": "smolvlm",
7
+ "pad_token_id": 128002,
8
+ "scale_factor": 4,
9
+ "text_config": {
10
+ "_flash_attn_2_enabled": true,
11
+ "_name_or_path": "None",
12
+ "architectures": [
13
+ "VLlama3ForCausalLM"
14
+ ],
15
+ "head_dim": 64,
16
+ "hidden_size": 960,
17
+ "intermediate_size": 2560,
18
+ "is_llama_config": true,
19
+ "max_position_embeddings": 8192,
20
+ "model_type": "llama",
21
+ "neftune_noise_alpha": 0.0,
22
+ "num_attention_heads": 15,
23
+ "num_hidden_layers": 32,
24
+ "num_key_value_heads": 5,
25
+ "pad_token_id": 2,
26
+ "perceiver_config": {
27
+ "_attn_implementation_autoset": false,
28
+ "_name_or_path": "",
29
+ "add_cross_attention": false,
30
+ "architectures": null,
31
+ "attention_dropout": 0.0,
32
+ "bad_words_ids": null,
33
+ "begin_suppress_tokens": null,
34
+ "bos_token_id": null,
35
+ "chunk_size_feed_forward": 0,
36
+ "cross_attention_hidden_size": null,
37
+ "decoder_start_token_id": null,
38
+ "diversity_penalty": 0.0,
39
+ "do_sample": false,
40
+ "early_stopping": false,
41
+ "encoder_no_repeat_ngram_size": 0,
42
+ "eos_token_id": null,
43
+ "exponential_decay_length_penalty": null,
44
+ "finetuning_task": null,
45
+ "forced_bos_token_id": null,
46
+ "forced_eos_token_id": null,
47
+ "hidden_act": "silu",
48
+ "id2label": {
49
+ "0": "LABEL_0",
50
+ "1": "LABEL_1"
51
+ },
52
+ "is_decoder": false,
53
+ "is_encoder_decoder": false,
54
+ "label2id": {
55
+ "LABEL_0": 0,
56
+ "LABEL_1": 1
57
+ },
58
+ "length_penalty": 1.0,
59
+ "max_length": 20,
60
+ "min_length": 0,
61
+ "model_type": "vllama3",
62
+ "no_repeat_ngram_size": 0,
63
+ "num_beam_groups": 1,
64
+ "num_beams": 1,
65
+ "num_key_value_heads": 1,
66
+ "num_return_sequences": 1,
67
+ "output_attentions": false,
68
+ "output_hidden_states": false,
69
+ "output_scores": false,
70
+ "pad_token_id": null,
71
+ "prefix": null,
72
+ "problem_type": null,
73
+ "pruned_heads": {},
74
+ "qk_layer_norms_perceiver": false,
75
+ "remove_invalid_values": false,
76
+ "repetition_penalty": 1.0,
77
+ "resampler_depth": 6,
78
+ "resampler_head_dim": 96,
79
+ "resampler_n_heads": 16,
80
+ "resampler_n_latents": 64,
81
+ "return_dict": true,
82
+ "return_dict_in_generate": false,
83
+ "sep_token_id": null,
84
+ "suppress_tokens": null,
85
+ "task_specific_params": null,
86
+ "temperature": 1.0,
87
+ "tf_legacy_loss": false,
88
+ "tie_encoder_decoder": false,
89
+ "tie_word_embeddings": true,
90
+ "tokenizer_class": null,
91
+ "top_k": 50,
92
+ "top_p": 1.0,
93
+ "torch_dtype": null,
94
+ "torchscript": false,
95
+ "transformers_version": "4.46.0",
96
+ "typical_p": 1.0,
97
+ "use_bfloat16": false
98
+ },
99
+ "pixel_shuffle_factor": 4,
100
+ "qk_layer_norms": false,
101
+ "rms_norm_eps": 1e-05,
102
+ "rope_interleaved": false,
103
+ "rope_theta": 100000,
104
+ "torch_dtype": "bfloat16",
105
+ "transformers.js_config": {
106
+ "kv_cache_dtype": {
107
+ "fp16": "float16",
108
+ "q4f16": "float16"
109
+ }
110
+ },
111
+ "use_resampler": false,
112
+ "vocab_size": 49281
113
+ },
114
+ "tie_word_embeddings": false,
115
+ "torch_dtype": "float32",
116
+ "transformers.js_config": {
117
+ "kv_cache_dtype": {
118
+ "fp16": "float16",
119
+ "q4f16": "float16"
120
+ }
121
+ },
122
+ "transformers_version": "4.47.1",
123
+ "use_cache": false,
124
+ "use_reentrant_checkpointing": false,
125
+ "vision_config": {
126
+ "hidden_size": 768,
127
+ "image_size": 512,
128
+ "max_image_size": {
129
+ "longest_edge": 512
130
+ },
131
+ "model_type": "smolvlm_vision",
132
+ "num_attention_heads": 12,
133
+ "patch_size": 16,
134
+ "size": {
135
+ "longest_edge": 512
136
+ },
137
+ "tie_word_embeddings": false,
138
+ "use_base_siglip": false
139
+ },
140
+ "vocab_size": 49281
141
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 49279,
5
+ "pad_token_id": 2,
6
+ "transformers_version": "4.47.1"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9bfd456c9472c0acd5719d6e514c4b859891af205ee1a736552fd3497b8b0c3
3
+ size 2029990624
onnx/decoder_model_merged.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c8743f020606f401ffa0c0ec7a7055da6ad7518b981ecdb060b62da3a60ec45
3
+ size 1450426001
onnx/decoder_model_merged_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c20111c92ecee932e85de34e0afd9341f60bb210456bc5d8a4809c055dd5bc15
3
+ size 206503532
onnx/decoder_model_merged_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ace259f4ff3a070cfcefc41c6b4ddaf52a5c1b988348004da13043ff4a6de5ae
3
+ size 725489385
onnx/decoder_model_merged_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d0a356562c7e474012c5e383de58d44d487a517c04190d26e5891db86cb2b1f
3
+ size 365039224
onnx/decoder_model_merged_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69561703c23810bf58cd4fe81b2427df95339a5b55743f62d1e767bd4a83a1e2
3
+ size 229119396
onnx/decoder_model_merged_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6265f0aa571ad624f67b5f19b79f1562b4bfd4e6db812815c1794172985b3e5e
3
+ size 205328508
onnx/decoder_model_merged_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5aaa12d62d43a05a4a7456864c508d0b3c17c55a654c05fb568f6da0b7d212d
3
+ size 365039344
onnx/decoder_model_merged_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5aaa12d62d43a05a4a7456864c508d0b3c17c55a654c05fb568f6da0b7d212d
3
+ size 365039344
onnx/embed_tokens.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4696831e7181d72fd7736b0c119cd53141a46c940ca24c03ce05ba623b402dd4
3
+ size 189235499
onnx/embed_tokens_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc09c2b3264365dcc8de8cae7513b640a40624cd038ede20a0ffcac0f82ecb11
3
+ size 189235518
onnx/embed_tokens_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc9e790f2e90ebf3b1efe84ec82530d6c12e27b0a103703fe865a5faf24f2336
3
+ size 94617986
onnx/embed_tokens_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e6d275190cd1cdf3339225b98b42116871ddc4e0c3210801f02bf2a1d2e164a
3
+ size 47309344
onnx/embed_tokens_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc09c2b3264365dcc8de8cae7513b640a40624cd038ede20a0ffcac0f82ecb11
3
+ size 189235518
onnx/embed_tokens_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a330cc444671c47e6a2de559405404043b2a49b015188250eca8c0c88d263b73
3
+ size 94618005
onnx/embed_tokens_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e6d275190cd1cdf3339225b98b42116871ddc4e0c3210801f02bf2a1d2e164a
3
+ size 47309344
onnx/embed_tokens_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e6d275190cd1cdf3339225b98b42116871ddc4e0c3210801f02bf2a1d2e164a
3
+ size 47309344
onnx/vision_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4797cdad0ba1958e7dfcc5e074d0b02ae3c4df80cce17ecab282dde37e0ddc1
3
+ size 393190822
onnx/vision_encoder_bnb4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:749b03c76adc0311fdc2d347ace3c9f78c792816e519c3e28a17f12f27cd496d
3
+ size 60688904
onnx/vision_encoder_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9faa8f3b4a922514a2e7773f20e99fdf7dc7b4eeeeaca1fb82cf18f7945ca23a
3
+ size 196731834
onnx/vision_encoder_int8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d931eb204e21ab4d905d62976aeaefe20360076f52760ba55b84a5b35088284f
3
+ size 98966476
onnx/vision_encoder_q4.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fb71423919f04851bdac96588267e6a9db95903ca94fcd7b7a0f51b5cd53d8a
3
+ size 66734064
onnx/vision_encoder_q4f16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7e811e4cef2c214c80ea7572eacb102ea396a881fb9170577b20c10fe4fa485
3
+ size 57691749
onnx/vision_encoder_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e8e2950e6837a71a3d972a14eb287d8875c4c2d5e8cfd6001bf185115295cdd
3
+ size 98966521
onnx/vision_encoder_uint8.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e8e2950e6837a71a3d972a14eb287d8875c4c2d5e8cfd6001bf185115295cdd
3
+ size 98966521
preprocessor_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_image_splitting": true,
4
+ "do_normalize": true,
5
+ "do_pad": true,
6
+ "do_rescale": true,
7
+ "do_resize": true,
8
+ "image_mean": [
9
+ 0.5,
10
+ 0.5,
11
+ 0.5
12
+ ],
13
+ "image_processor_type": "SmolVLMImageProcessor",
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "max_image_size": {
20
+ "longest_edge": 512
21
+ },
22
+ "processor_class": "SmolVLMProcessor",
23
+ "resample": 1,
24
+ "rescale_factor": 0.00392156862745098,
25
+ "size": {
26
+ "longest_edge": 512
27
+ },
28
+ "video_sampling": {
29
+ "fps": 1,
30
+ "max_frames": 64,
31
+ "video_size": {
32
+ "longest_edge": 512
33
+ }
34
+ }
35
+ }
processor_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "image_seq_len": 64,
3
+ "processor_class": "SmolVLMProcessor"
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<action>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ }
10
+ ],
11
+ "bos_token": {
12
+ "content": "<|im_start|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "end_of_utterance_token": "<end_of_utterance>",
19
+ "eos_token": {
20
+ "content": "<end_of_utterance>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "fake_image_token": "<fake_token_around_image>",
27
+ "global_image_token": "<global-img>",
28
+ "image_token": "<image>",
29
+ "pad_token": {
30
+ "content": "<|im_end|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false
35
+ },
36
+ "unk_token": {
37
+ "content": "<|endoftext|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false
42
+ }
43
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,1197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<repo_name>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<reponame>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<file_sep>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "6": {
53
+ "content": "<filename>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "7": {
61
+ "content": "<gh_stars>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "8": {
69
+ "content": "<issue_start>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "9": {
77
+ "content": "<issue_comment>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "10": {
85
+ "content": "<issue_closed>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "11": {
93
+ "content": "<jupyter_start>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "12": {
101
+ "content": "<jupyter_text>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "13": {
109
+ "content": "<jupyter_code>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "14": {
117
+ "content": "<jupyter_output>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "15": {
125
+ "content": "<jupyter_script>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "16": {
133
+ "content": "<empty_output>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "49152": {
141
+ "content": "<global-img>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "49153": {
149
+ "content": "<row_1_col_1>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "49154": {
157
+ "content": "<row_1_col_2>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "49155": {
165
+ "content": "<row_1_col_3>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "49156": {
173
+ "content": "<row_1_col_4>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "49157": {
181
+ "content": "<row_1_col_5>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "49158": {
189
+ "content": "<row_1_col_6>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "49159": {
197
+ "content": "<row_2_col_1>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "49160": {
205
+ "content": "<row_2_col_2>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "49161": {
213
+ "content": "<row_2_col_3>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "49162": {
221
+ "content": "<row_2_col_4>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "49163": {
229
+ "content": "<row_2_col_5>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "49164": {
237
+ "content": "<row_2_col_6>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "49165": {
245
+ "content": "<row_3_col_1>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "49166": {
253
+ "content": "<row_3_col_2>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "49167": {
261
+ "content": "<row_3_col_3>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "49168": {
269
+ "content": "<row_3_col_4>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "49169": {
277
+ "content": "<row_3_col_5>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "49170": {
285
+ "content": "<row_3_col_6>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "49171": {
293
+ "content": "<row_4_col_1>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "49172": {
301
+ "content": "<row_4_col_2>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "49173": {
309
+ "content": "<row_4_col_3>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "49174": {
317
+ "content": "<row_4_col_4>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "49175": {
325
+ "content": "<row_4_col_5>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "49176": {
333
+ "content": "<row_4_col_6>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "49177": {
341
+ "content": "<row_5_col_1>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "49178": {
349
+ "content": "<row_5_col_2>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "49179": {
357
+ "content": "<row_5_col_3>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "49180": {
365
+ "content": "<row_5_col_4>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "49181": {
373
+ "content": "<row_5_col_5>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "49182": {
381
+ "content": "<row_5_col_6>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "49183": {
389
+ "content": "<row_6_col_1>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "49184": {
397
+ "content": "<row_6_col_2>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "49185": {
405
+ "content": "<row_6_col_3>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "49186": {
413
+ "content": "<row_6_col_4>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "49187": {
421
+ "content": "<row_6_col_5>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "49188": {
429
+ "content": "<row_6_col_6>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "49189": {
437
+ "content": "<fake_token_around_image>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "49190": {
445
+ "content": "<image>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "49191": {
453
+ "content": "<|reserved_special_token_0|>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "49192": {
461
+ "content": "<|reserved_special_token_1|>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "49193": {
469
+ "content": "<|reserved_special_token_2|>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "49194": {
477
+ "content": "<|reserved_special_token_3|>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "49195": {
485
+ "content": "<|reserved_special_token_4|>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "49196": {
493
+ "content": "<|reserved_special_token_5|>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "49197": {
501
+ "content": "<|reserved_special_token_6|>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "49198": {
509
+ "content": "<|reserved_special_token_7|>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "49199": {
517
+ "content": "<|reserved_special_token_8|>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "49200": {
525
+ "content": "<|reserved_special_token_9|>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "49201": {
533
+ "content": "<|reserved_special_token_10|>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "49202": {
541
+ "content": "<|reserved_special_token_11|>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "49203": {
549
+ "content": "<|reserved_special_token_12|>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "49204": {
557
+ "content": "<|reserved_special_token_13|>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "49205": {
565
+ "content": "<|reserved_special_token_14|>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "49206": {
573
+ "content": "<|reserved_special_token_15|>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "49207": {
581
+ "content": "<|reserved_special_token_16|>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "49208": {
589
+ "content": "<|reserved_special_token_17|>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "49209": {
597
+ "content": "<|reserved_special_token_18|>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "49210": {
605
+ "content": "<|reserved_special_token_19|>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "49211": {
613
+ "content": "<|reserved_special_token_20|>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "49212": {
621
+ "content": "<|reserved_special_token_21|>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "49213": {
629
+ "content": "<|reserved_special_token_22|>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "49214": {
637
+ "content": "<|reserved_special_token_23|>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "49215": {
645
+ "content": "<|reserved_special_token_24|>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "49216": {
653
+ "content": "<|reserved_special_token_25|>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "49217": {
661
+ "content": "<|reserved_special_token_26|>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "49218": {
669
+ "content": "<|reserved_special_token_27|>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "49219": {
677
+ "content": "<|reserved_special_token_28|>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "49220": {
685
+ "content": "<|reserved_special_token_29|>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "49221": {
693
+ "content": "<|reserved_special_token_30|>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "49222": {
701
+ "content": "<|reserved_special_token_31|>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "49223": {
709
+ "content": "<|reserved_special_token_32|>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "49224": {
717
+ "content": "<|reserved_special_token_33|>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "49225": {
725
+ "content": "<|reserved_special_token_34|>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "49226": {
733
+ "content": "<|reserved_special_token_35|>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "49227": {
741
+ "content": "<|reserved_special_token_36|>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "49228": {
749
+ "content": "<|reserved_special_token_37|>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "49229": {
757
+ "content": "<|reserved_special_token_38|>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "49230": {
765
+ "content": "<|reserved_special_token_39|>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "49231": {
773
+ "content": "<|reserved_special_token_40|>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "49232": {
781
+ "content": "<|reserved_special_token_41|>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "49233": {
789
+ "content": "<|reserved_special_token_42|>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "49234": {
797
+ "content": "<|reserved_special_token_43|>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "49235": {
805
+ "content": "<|reserved_special_token_44|>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "49236": {
813
+ "content": "<|reserved_special_token_45|>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "49237": {
821
+ "content": "<|reserved_special_token_46|>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "49238": {
829
+ "content": "<|reserved_special_token_47|>",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": true
835
+ },
836
+ "49239": {
837
+ "content": "<|reserved_special_token_48|>",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": true
843
+ },
844
+ "49240": {
845
+ "content": "<|reserved_special_token_49|>",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": true
851
+ },
852
+ "49241": {
853
+ "content": "<|reserved_special_token_50|>",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "49242": {
861
+ "content": "<|reserved_special_token_51|>",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "49243": {
869
+ "content": "<|reserved_special_token_52|>",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": true
875
+ },
876
+ "49244": {
877
+ "content": "<|reserved_special_token_53|>",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": true
883
+ },
884
+ "49245": {
885
+ "content": "<|reserved_special_token_54|>",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": true
891
+ },
892
+ "49246": {
893
+ "content": "<|reserved_special_token_55|>",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": true
899
+ },
900
+ "49247": {
901
+ "content": "<|reserved_special_token_56|>",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": true
907
+ },
908
+ "49248": {
909
+ "content": "<|reserved_special_token_57|>",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": true
915
+ },
916
+ "49249": {
917
+ "content": "<|reserved_special_token_58|>",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": true
923
+ },
924
+ "49250": {
925
+ "content": "<|reserved_special_token_59|>",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": true
931
+ },
932
+ "49251": {
933
+ "content": "<|reserved_special_token_60|>",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": true
939
+ },
940
+ "49252": {
941
+ "content": "<|reserved_special_token_61|>",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": true
947
+ },
948
+ "49253": {
949
+ "content": "<|reserved_special_token_62|>",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": true
955
+ },
956
+ "49254": {
957
+ "content": "<|reserved_special_token_63|>",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": true
963
+ },
964
+ "49255": {
965
+ "content": "<|reserved_special_token_64|>",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": true
971
+ },
972
+ "49256": {
973
+ "content": "<|reserved_special_token_65|>",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": true
979
+ },
980
+ "49257": {
981
+ "content": "<|reserved_special_token_66|>",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": true
987
+ },
988
+ "49258": {
989
+ "content": "<|reserved_special_token_67|>",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": true
995
+ },
996
+ "49259": {
997
+ "content": "<|reserved_special_token_68|>",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": true
1003
+ },
1004
+ "49260": {
1005
+ "content": "<|reserved_special_token_69|>",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": true
1011
+ },
1012
+ "49261": {
1013
+ "content": "<|reserved_special_token_70|>",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": true
1019
+ },
1020
+ "49262": {
1021
+ "content": "<|reserved_special_token_71|>",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": true
1027
+ },
1028
+ "49263": {
1029
+ "content": "<|reserved_special_token_72|>",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": true
1035
+ },
1036
+ "49264": {
1037
+ "content": "<|reserved_special_token_73|>",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": true
1043
+ },
1044
+ "49265": {
1045
+ "content": "<|reserved_special_token_74|>",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": true
1051
+ },
1052
+ "49266": {
1053
+ "content": "<|reserved_special_token_75|>",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": true
1059
+ },
1060
+ "49267": {
1061
+ "content": "<|reserved_special_token_76|>",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": true
1067
+ },
1068
+ "49268": {
1069
+ "content": "<|reserved_special_token_77|>",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": true
1075
+ },
1076
+ "49269": {
1077
+ "content": "<|reserved_special_token_78|>",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": true
1083
+ },
1084
+ "49270": {
1085
+ "content": "<|reserved_special_token_79|>",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": true
1091
+ },
1092
+ "49271": {
1093
+ "content": "<|reserved_special_token_80|>",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": true
1099
+ },
1100
+ "49272": {
1101
+ "content": "<|reserved_special_token_81|>",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": true
1107
+ },
1108
+ "49273": {
1109
+ "content": "<|reserved_special_token_82|>",
1110
+ "lstrip": false,
1111
+ "normalized": false,
1112
+ "rstrip": false,
1113
+ "single_word": false,
1114
+ "special": true
1115
+ },
1116
+ "49274": {
1117
+ "content": "<|reserved_special_token_83|>",
1118
+ "lstrip": false,
1119
+ "normalized": false,
1120
+ "rstrip": false,
1121
+ "single_word": false,
1122
+ "special": true
1123
+ },
1124
+ "49275": {
1125
+ "content": "<|reserved_special_token_84|>",
1126
+ "lstrip": false,
1127
+ "normalized": false,
1128
+ "rstrip": false,
1129
+ "single_word": false,
1130
+ "special": true
1131
+ },
1132
+ "49276": {
1133
+ "content": "<|reserved_special_token_85|>",
1134
+ "lstrip": false,
1135
+ "normalized": false,
1136
+ "rstrip": false,
1137
+ "single_word": false,
1138
+ "special": true
1139
+ },
1140
+ "49277": {
1141
+ "content": "<|reserved_special_token_86|>",
1142
+ "lstrip": false,
1143
+ "normalized": false,
1144
+ "rstrip": false,
1145
+ "single_word": false,
1146
+ "special": true
1147
+ },
1148
+ "49278": {
1149
+ "content": "<|reserved_special_token_87|>",
1150
+ "lstrip": false,
1151
+ "normalized": false,
1152
+ "rstrip": false,
1153
+ "single_word": false,
1154
+ "special": true
1155
+ },
1156
+ "49279": {
1157
+ "content": "<end_of_utterance>",
1158
+ "lstrip": false,
1159
+ "normalized": false,
1160
+ "rstrip": false,
1161
+ "single_word": false,
1162
+ "special": true
1163
+ },
1164
+ "49280": {
1165
+ "content": "<action>",
1166
+ "lstrip": false,
1167
+ "normalized": false,
1168
+ "rstrip": false,
1169
+ "single_word": false,
1170
+ "special": true
1171
+ }
1172
+ },
1173
+ "additional_special_tokens": [
1174
+ "<action>"
1175
+ ],
1176
+ "bos_token": "<|im_start|>",
1177
+ "clean_up_tokenization_spaces": false,
1178
+ "end_of_utterance_token": "<end_of_utterance>",
1179
+ "eos_token": "<end_of_utterance>",
1180
+ "extra_special_tokens": {
1181
+ "end_of_utterance_token": "<end_of_utterance>",
1182
+ "fake_image_token": "<fake_token_around_image>",
1183
+ "global_image_token": "<global-img>",
1184
+ "image_token": "<image>"
1185
+ },
1186
+ "fake_image_token": "<fake_token_around_image>",
1187
+ "global_image_token": "<global-img>",
1188
+ "image_token": "<image>",
1189
+ "legacy": false,
1190
+ "model_max_length": 8192,
1191
+ "pad_token": "<|im_end|>",
1192
+ "processor_class": "SmolVLMProcessor",
1193
+ "tokenizer_class": "GPT2Tokenizer",
1194
+ "truncation_side": "left",
1195
+ "unk_token": "<|endoftext|>",
1196
+ "vocab_size": 49152
1197
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff