inigopm commited on
Commit
a6a1f7a
·
0 Parent(s):

Initial commit

Browse files
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ images/salamandra_header.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,366 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BSC-LT/salamandra-7b-vision RESEARCH-ONLY RAIL-AMS
2
+
3
+ Licensed Artifact(s):
4
+
5
+ - Application
6
+
7
+ - Model
8
+
9
+ - Source Code
10
+
11
+ Section I: PREAMBLE
12
+
13
+ This Research-Only RAIL License is generally applicable to the
14
+ Artifact(s) identified above.
15
+
16
+ For valuable consideration, You and Licensor agree as follows:
17
+
18
+ 1. Definitions
19
+
20
+ (a) “Application” refers to a sequence of instructions or statements
21
+ written in machine code language, including object code (that is the
22
+ product of a compiler), binary code (data using a two-symbol system)
23
+ or an intermediate language (such as register transfer language).
24
+
25
+ (b) “Artifact” refers to a software application (in either binary or
26
+ source code format), Model, and/or Source Code, in accordance with
27
+ what is specified above as the “Licensed Artifact”.
28
+
29
+ (c) ”Contribution” means any work, including any modifications or
30
+ additions to an Artifact, that is intentionally submitted to
31
+ Licensor for inclusion or incorporation in the Artifact directly or
32
+ indirectly by the rights owner. For the purposes of this definition,
33
+ “submitted” means any form of electronic, verbal, or written
34
+ communication sent to the Licensor or its representatives, including
35
+ but not limited to communication on electronic mailing lists, source
36
+ code control systems, and issue tracking systems that are managed
37
+ by, or on behalf of, the Licensor for the purpose of discussing,
38
+ sharing and improving the Artifact, but excluding communication that
39
+ is conspicuously marked or otherwise designated in writing by the
40
+ contributor as “Not a Contribution.”
41
+
42
+ (d) “Contributor” means Licensor or any other individual or legal entity
43
+ that creates or owns a Contribution that is added to or incorporated
44
+ into an Artifact or its Derivative.
45
+
46
+ (e) “Data” means a collection of information and/or content extracted
47
+ from the dataset used with a given Model, including to train,
48
+ pretrain, or otherwise evaluate the Model. The Data is not licensed
49
+ under this License.
50
+
51
+ (f) “Derivative” means a work derived from or based upon an Artifact,
52
+ and includes all modified versions of such Artifact.
53
+
54
+ (g) “Distribution” means any transmission, reproduction, publication or
55
+ other sharing of an Artifact or Derivative to a third party,
56
+ including providing a hosted service incorporating the Artifact,
57
+ which is made available by electronic or other remote means -
58
+ e.g. API-based or web access.
59
+
60
+ (h) “Harm” includes but is not limited to physical, mental,
61
+ psychological, financial and reputational damage, pain, or loss.
62
+
63
+ (i) “License” means the terms and conditions for use, reproduction, and
64
+ Distribution as defined in this document.
65
+
66
+ (j) “Licensor” means the rights owner (by virtue of creation or
67
+ documented transfer of ownership) or entity authorized by the rights
68
+ owner (e.g., exclusive licensee) that is granting the rights in this
69
+ License.
70
+
71
+ (k) “Model” means any machine-learning based assembly or assemblies
72
+ (including checkpoints), consisting of learnt weights, parameters
73
+ (including optimizer states), corresponding to the model
74
+ architecture as embodied in the Source Code.
75
+
76
+ (l) “Output” means the results of operating a Model as embodied in
77
+ informational content resulting therefrom.
78
+
79
+ (m) “Permitted Purpose” means for academic or research purposes only.
80
+
81
+ (n) “Source Code” means any collection of text written using
82
+ human-readable programming language, including the code and scripts
83
+ used to define, run, load, benchmark or evaluate a Model or any
84
+ component thereof, and/or used to prepare data for training or
85
+ evaluation, if any. Source Code includes any accompanying
86
+ documentation, tutorials, examples, etc, if any. For clarity, the
87
+ term “Source Code” as used in this License includes any and all
88
+ Derivatives of such Source Code.
89
+
90
+ (o) “Third Parties” means individuals or legal entities that are not
91
+ under common control with Licensor or You.
92
+
93
+ (p) “Use” includes accessing, using, copying, modifying, and/or
94
+ distributing an Artifact; in connection with a Model as Artifact,
95
+ Use also includes creating content, fine-tuning, updating, running,
96
+ training, evaluating and/or re-parametrizing such Model.
97
+
98
+ (q) “You” (or “Your”) means an individual or legal entity receiving and
99
+ exercising permissions granted by this License and/or making use of
100
+ the Artifact for permitted purposes and in any permitted field of
101
+ use, including usage of the Artifact in an end-use application -
102
+ e.g. chatbot, translator, image generator, etc.
103
+
104
+ Section II: INTELLECTUAL PROPERTY RIGHTS
105
+
106
+ Both copyright and patent grants may apply to the Artifact. The Artifact
107
+ is subject to additional terms as described in Section III below, which
108
+ govern the use of the Artifact in the event that Section II is held
109
+ unenforceable or inapplicable.
110
+
111
+ 2. Grant of Copyright License. Conditioned upon compliance with Section
112
+ III below and subject to the terms and conditions of this License, each
113
+ Contributor hereby grants to You, only in connection with the Permitted
114
+ Purpose, a worldwide, non-exclusive, royalty-free copyright license to
115
+ reproduce, use, publicly display, publicly perform, sublicense, and
116
+ distribute the Artifact and Derivatives thereof.
117
+
118
+ 3. Grant of Patent License. Conditioned upon compliance with Section III
119
+ below and subject to the terms and conditions of this License, and only
120
+ where and as applicable, each Contributor hereby grants to You, only in
121
+ connection with the Permitted Purpose, a worldwide, non-exclusive,
122
+ royalty-free, irrevocable (except as stated in this paragraph) patent
123
+ license to make, have made, use, sell, offer to sell, import, and
124
+ otherwise transfer the Artifact where such license applies only to those
125
+ patent claims licensable by such Contributor that are necessarily
126
+ infringed by their Contribution(s) alone or by combination of their
127
+ Contribution(s) with the Artifact to which such Contribution(s) was
128
+ submitted. If You institute patent litigation against any entity
129
+ (including a cross-claim or counterclaim in a lawsuit) alleging that the
130
+ Artifact and/or a Contribution incorporated within the Artifact
131
+ constitutes direct or contributory patent infringement, then any patent
132
+ licenses granted to You under this License in connection with the
133
+ Artifact shall terminate as of the date such litigation is asserted or
134
+ filed.
135
+
136
+ Licensor and Contributor each have the right to grant the licenses
137
+ above.
138
+
139
+ Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
140
+
141
+ 4. Use-based restrictions. The restrictions set forth in Attachment A
142
+ are mandatory Use-based restrictions. Therefore You may not Use the
143
+ Artifact in violation of such restrictions. You may Use the Artifact
144
+ only subject to this License. You shall require all of Your users who
145
+ use the Artifact or its Derivative to comply with the terms of this
146
+ paragraph and only for the Permitted Purpose.
147
+
148
+ 5. The Output You Generate with a Model (as Artfact). Except as set
149
+ forth herein, Licensor claims no rights in the Output You generate. You
150
+ are accountable for the Output You generate and its subsequent uses. No
151
+ use of the Output may contravene any provision as stated in this
152
+ License.
153
+
154
+ 6. Distribution and Redistribution. You may host for Third Party remote
155
+ access purposes (e.g. software-as-a-service), reproduce and distribute
156
+ copies of the Artifact or its Derivatives in any medium, with or without
157
+ modifications, provided that You meet the following conditions:
158
+
159
+ 1. Use-based restrictions in paragraph 4 MUST be included as a
160
+ condition precedent to effect any type of legal agreement (e.g. a
161
+ license) governing the use and/or distribution of the Artifact or
162
+ its Derivatives, and You shall give such notice to any subsequent
163
+ Third Party recipients;
164
+ 2. You shall give any Third Party recipients of the Artifact or its
165
+ Derivatives a copy of this License;
166
+ 3. You shall cause any modified files to carry prominent notices
167
+ stating that You changed the files;
168
+ 4. You shall retain all copyright, patent, trademark, and attribution
169
+ notices excluding those notices that do not pertain to any part of
170
+ the Artifact or its Derivatives.
171
+ 5. You and any Third Party recipients of the Artifact or its Derivative
172
+ shall adhere to the Permitted Purpose.
173
+
174
+ You may add Your own copyright statement to Your modifications and may
175
+ provide additional or different license terms and conditions with
176
+ respect to paragraph 6.1., to govern the use, reproduction, or
177
+ Distribution of Your modifications, or for any Derivative, provided that
178
+ Your use, reproduction, and Distribution of the Artifact or its
179
+ Derivative otherwise complies with the conditions stated in this
180
+ License. In other words, the Use-based restrictions in Attachment A form
181
+ the minimum set of terms for You to license to Third Parties any
182
+ Artifact or its Derivative, but You may add more restrictive terms if
183
+ You deem it necessary.
184
+
185
+ Section IV: OTHER PROVISIONS
186
+
187
+ 7. Updates and Runtime Restrictions. To the maximum extent permitted by
188
+ law, Licensor reserves the right to restrict (remotely or otherwise)
189
+ usage of the Artifact in violation of this License or update the
190
+ Artifact through electronic means.
191
+
192
+ 8. Trademarks and related. Nothing in this License permits You to make
193
+ use of Licensors’ trademarks, trade names, logos or to otherwise suggest
194
+ endorsement or misrepresent the relationship between the parties; and
195
+ any rights not expressly granted herein are reserved by the Licensors.
196
+
197
+ 9. Disclaimer of Warranty. Unless required by applicable law or agreed
198
+ to in writing, Licensor provides the Artifact (and each Contributor
199
+ provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR
200
+ CONDITIONS OF ANY KIND, either express or implied, including, without
201
+ limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
202
+ MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely
203
+ responsible for determining the appropriateness of using the Artifact,
204
+ and assume any risks associated with Your exercise of permissions under
205
+ this License.
206
+
207
+ 10. Limitation of Liability. In no event and under no legal theory,
208
+ whether in tort (including negligence), contract, or otherwise, unless
209
+ required by applicable law (such as deliberate and grossly negligent
210
+ acts) or agreed to in writing, shall any Contributor be liable to You
211
+ for damages, including any direct, indirect, special, incidental, or
212
+ consequential damages of any character arising as a result of this
213
+ License or out of the use or inability to use the Artifact (including
214
+ but not limited to damages for loss of goodwill, work stoppage, computer
215
+ failure or malfunction, or any and all other commercial damages or
216
+ losses), even if such Contributor has been advised of the possibility of
217
+ such damages.
218
+
219
+ 11. If any provision of this License is held to be invalid, illegal or
220
+ unenforceable, the remaining provisions shall be unaffected thereby and
221
+ remain valid as if such provision had not been set forth herein.
222
+
223
+ 12. Term and Termination. The term of this License will commence upon
224
+ the earlier of (a) Your acceptance of this License or (b) accessing the
225
+ Artifact; and will continue in full force and effect until terminated in
226
+ accordance with the terms and conditions herein. Licensor may terminate
227
+ this License if You are in breach of any term or condition of this
228
+ Agreement. Upon termination of this Agreement, You shall delete and
229
+ cease use of the Artifact. Section 10 shall survive the termination of
230
+ this License.
231
+
232
+ END OF TERMS AND CONDITIONS
233
+
234
+ Attachment A
235
+
236
+ USE RESTRICTIONS
237
+
238
+ You agree not to use the Artifact or its Derivatives in any of the
239
+ following ways:
240
+
241
+ 1. Discrimination
242
+
243
+ (a) To discriminate or exploit individuals or groups based on
244
+ legally protected characteristics and/or vulnerabilities.
245
+
246
+ (b) For purposes of administration of justice, law enforcement,
247
+ immigration, or asylum processes, such as predicting that a
248
+ natural person will commit a crime or the likelihood thereof.
249
+
250
+ (c) To engage in, promote, incite, or facilitate discrimination or
251
+ other unlawful or harmful conduct in the provision of
252
+ employment, employment benefits, credit, housing, or other
253
+ essential goods and services.
254
+
255
+ 2. Military
256
+
257
+ (a) For weaponry or warfare.
258
+
259
+ (b) For purposes of building or optimizing military weapons or in
260
+ the service of nuclear proliferation or nuclear weapons
261
+ technology.
262
+
263
+ (c) For purposes of military surveillance, including any research or
264
+ development relating to military surveillance.
265
+
266
+ 3. Legal
267
+
268
+ (a) To engage or enable fully automated decision-making that
269
+ adversely impacts a natural person's legal rights without
270
+ expressly and intelligibly disclosing the impact to such natural
271
+ person and providing an appeal process.
272
+
273
+ (b) To engage or enable fully automated decision-making that
274
+ creates, modifies or terminates a binding, enforceable
275
+ obligation between entities; whether these include natural
276
+ persons or not.
277
+
278
+ (c) In any way that violates any applicable national, federal,
279
+ state, local or international law or regulation.
280
+
281
+ 4. Disinformation
282
+
283
+ (a) To create, present or disseminate verifiably false or misleading
284
+ information for economic gain or to intentionally deceive the
285
+ public, including creating false impersonations of natural
286
+ persons.
287
+
288
+ (b) To synthesize or modify a natural person's appearance, voice, or
289
+ other individual characteristics, unless prior informed consent
290
+ of said natural person is obtained.
291
+
292
+ (c) To autonomously interact with a natural person, in text or audio
293
+ format, unless disclosure and consent is given prior to
294
+ interaction that the system engaging in the interaction is not a
295
+ natural person.
296
+
297
+ (d) To defame or harm a natural person's reputation, such as by
298
+ generating, creating, promoting, or spreading defamatory content
299
+ (statements, images, or other content).
300
+
301
+ (e) To generate or disseminate information (including - but not
302
+ limited to - images, code, posts, articles), and place the
303
+ information in any public context without expressly and
304
+ intelligibly disclaiming that the information and/or content is
305
+ machine generated.
306
+
307
+ 5. Privacy
308
+
309
+ (a) To utilize personal information to infer additional personal
310
+ information about a natural person, including but not limited to
311
+ legally protected characteristics, vulnerabilities or
312
+ categories; unless informed consent from the data subject to
313
+ collect said inferred personal information for a stated purpose
314
+ and defined duration is received.
315
+
316
+ (b) To generate or disseminate personal identifiable information
317
+ that can be used to harm an individual or to invade the personal
318
+ privacy of an individual.
319
+
320
+ (c) To engage in, promote, incite, or facilitate the harassment,
321
+ abuse, threatening, or bullying of individuals or groups of
322
+ individuals.
323
+
324
+ 6. Health
325
+
326
+ (a) To provide medical advice or make clinical decisions without
327
+ necessary (external) accreditation of the system; unless the use
328
+ is (i) in an internal research context with independent and
329
+ accountable oversight and/or (ii) with medical professional
330
+ oversight that is accompanied by any related compulsory
331
+ certification and/or safety/quality standard for the
332
+ implementation of the technology.
333
+
334
+ (b) To provide medical advice and medical results interpretation
335
+ without external, human validation of such advice or
336
+ interpretation.
337
+
338
+ (c) In connection with any activities that present a risk of death
339
+ or bodily harm to individuals, including self-harm or harm to
340
+ others, or in connection with regulated or controlled
341
+ substances.
342
+
343
+ (d) In connection with activities that present a risk of death or
344
+ bodily harm to individuals, including inciting or promoting
345
+ violence, abuse, or any infliction of bodily harm to an
346
+ individual or group of individuals
347
+
348
+ 7. General
349
+
350
+ (a) To defame, disparage or otherwise harass others.
351
+
352
+ (b) To Intentionally deceive or mislead others, including failing to
353
+ appropriately disclose to end users any known dangers of your
354
+ system.
355
+
356
+ 8. Research
357
+
358
+ (a) In connection with any academic dishonesty, including submitting
359
+ any informational content or output of a Model as Your own work
360
+ in any academic setting.
361
+
362
+ 9. Malware
363
+
364
+ (a) To generate and/or disseminate malware (including - but not
365
+ limited to - ransomware) or any other content to be used for the
366
+ purpose of Harming electronic systems;
README.md ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - vision
7
+ - image-text-to-text
8
+ language:
9
+ - bg
10
+ - ca
11
+ - code
12
+ - cs
13
+ - cy
14
+ - da
15
+ - de
16
+ - el
17
+ - en
18
+ - es
19
+ - et
20
+ - eu
21
+ - fi
22
+ - fr
23
+ - ga
24
+ - gl
25
+ - hr
26
+ - hu
27
+ - it
28
+ - lt
29
+ - lv
30
+ - mt
31
+ - nl
32
+ - nn
33
+ - \no
34
+ - oc
35
+ - pl
36
+ - pt
37
+ - ro
38
+ - ru
39
+ - sh
40
+ - sk
41
+ - sl
42
+ - sr
43
+ - sv
44
+ - uk
45
+ base_model:
46
+ - BSC-LT/salamandra-7b
47
+ ---
48
+
49
+ ![](./images/salamandra_header.png)
50
+
51
+ # Salamandra Vision Model Card
52
+
53
+ Salamandra is a highly multilingual model pre-trained from scratch that comes in three different
54
+ sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants.
55
+ This model card corresponds to the 7B visual instructed version. Only the 7b model is currenlty instructed to understand images.
56
+
57
+ To visit the model cards of other Salamandra versions, please refer to the [Model Index](#base-model-index).
58
+
59
+ > [!WARNING]
60
+ > **DISCLAIMER:** This model is a first proof-of-concept designed to demonstrate the instruction-following capabilities of recently released base models.
61
+ > It has been optimized to engage in conversation but has *NOT* been aligned through RLHF to filter or avoid sensitive topics.
62
+ > As a result, it may generate harmful or inappropriate content.
63
+ > The team is actively working to enhance its performance through further instruction and alignment with RL techniques.
64
+
65
+ ---
66
+
67
+ ## Model Details
68
+
69
+ ### Description
70
+
71
+ We have adapted Salamandra to process images and videos. This was achieved through late-fusion techniques, which involve integrating a pre-trained encoder, a pre-trained LLM, and a projector. The training process focuses on transforming the encoder's image embeddings to align with the LLM, enabling the model to comprehend this new modality.
72
+
73
+ Salamandra is a transformer-based decoder-only language model that has been pre-trained from scratch on 7.8 trillion tokens of highly curated data.
74
+ The pre-training corpus contains text in 35 European languages and code.
75
+
76
+ ### Hyperparameters
77
+
78
+ The full list of hyperparameters can be found [here](https://huggingface.co/BSC-LT/salamandra-7b-vision/blob/main/config.json).
79
+
80
+ ### Framework
81
+
82
+ We utilized the [Llava Onevision](https://arxiv.org/abs/2408.03326) technique to train our vision model.
83
+
84
+ The model comprises a pre-trained encoder ([Google SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) - 14 patches, 384x384 resolution), our 7B-instructed as the LLM, and a projector initialized from scratch (2-layer MLP).
85
+
86
+ ---
87
+
88
+ ## Intended Use
89
+
90
+ ### Out-of-scope Use
91
+
92
+ The model is not intended for malicious activities, such as harming others or violating human rights.
93
+ Any downstream application must comply with current laws and regulations.
94
+ Irresponsible usage in production environments without proper risk assessment and mitigation is also discouraged.
95
+
96
+ ---
97
+
98
+ ## Hardware and Software
99
+
100
+ ### Training Framework
101
+
102
+ The visual instruction-tuned versions were produced with [Llava_Onevision](https://github.com/LLaVA-VL/LLaVA-NeXT).
103
+
104
+ ### Compute Infrastructure
105
+
106
+ All models were trained on [MareNostrum 5](https://www.bsc.es/ca/marenostrum/marenostrum-5), a pre-exascale EuroHPC supercomputer hosted and
107
+ operated by Barcelona Supercomputing Center.
108
+
109
+ The accelerated partition is composed of 1,120 nodes with the following specifications:
110
+ - 4x Nvidia Hopper GPUs with 64 HBM2 memory
111
+ - 2x Intel Sapphire Rapids 8460Y+ at 2.3Ghz and 32c each (64 cores)
112
+ - 4x NDR200 (BW per node 800Gb/s)
113
+ - 512 GB of Main memory (DDR5)
114
+ - 460GB on NVMe storage
115
+
116
+ ---
117
+
118
+ ## How to use
119
+
120
+ ```python
121
+ from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration
122
+ import torch
123
+ from PIL import Image
124
+
125
+ path = "BSC-LT/salamandra-7b-vision"
126
+
127
+ processor = AutoProcessor.from_pretrained(path)
128
+ model = LlavaOnevisionForConditionalGeneration.from_pretrained(path, torch_dtype=torch.float16, low_cpu_mem_usage=True).to("cuda")
129
+
130
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
131
+
132
+ image = Image.open(url)
133
+
134
+ conversation = [
135
+ {
136
+ "role": "user",
137
+ "content": [
138
+ {"type": "image"},
139
+ {"type": "text", "text": "Describe la imagen con el mayor detalle posible."},
140
+ ],
141
+ },
142
+ ]
143
+ prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
144
+ inputs = processor(images=image, text=prompt, return_tensors="pt").to("cuda", torch.float16)
145
+
146
+ output = model.generate(**inputs,
147
+ temperature=0.7,
148
+ max_new_tokens=1024)
149
+
150
+ output_tokens = output[0].tolist()
151
+
152
+ print(processor.decode(output[0], skip_special_tokens=True))
153
+ ```
154
+ Using this template, each turn is preceded by a `<|im_start|>` delimiter and the role of the entity
155
+ (either `user`, for content supplied by the user, or `assistant` for LLM responses), and finished with the `<|im_end|>` token.
156
+
157
+ ---
158
+
159
+ ## Data
160
+
161
+ The data distribution used to finetune the model is illustrated in the figure below . Most of them were sourced from LLaVA OneVision preprocessed data. This includes data from AI2D, Cambrian, and high-quality datasets such as re-captioned detailed description data from LLaVA Next.
162
+ Diverse thematic data were included to enhance the model's capabilities in subtasks such as grounding, OCR, document understanding, and math. Additionally, we incorporated text-only multilingual data in various European languages and high-quality text-only data in Spanish, Catalan, Galician, and Basque, which were also used in the instruction tuning stage.
163
+
164
+ ![](./images/data_distribution.png)
165
+
166
+ ---
167
+
168
+ ## Evaluation
169
+
170
+ As there is a lack of multimodal multilingual evaluation data, we haven't performed a thorough multilingual evaluation yet (coming soon). The English evaluations are shown in the table below:
171
+
172
+ | Task | Subtask | Metric | Value |
173
+ |--------------|-------------------------|-------------------------|-----------|
174
+ | ai2d | | exact_match | 0.7451 |
175
+ | mme | cognition_score | mme_cognition_score | 246.4286 |
176
+ | | perception_score | mme_perception_score | 1371.8164 |
177
+ | mmmu_val | | accuracy | 0.3689 |
178
+ | mmstar | average | accuracy | 0.4865 |
179
+ | | coarse perception | accuracy | 0.7127 |
180
+ | | fine-grained perception | accuracy | 0.3799 |
181
+ | | instance reasoning | accuracy | 0.5674 |
182
+ | | logical reasoning | accuracy | 0.4478 |
183
+ | | math | accuracy | 0.4279 |
184
+ | | science & technology | accuracy | 0.3832 |
185
+ | realworldqa | | exact_match | 0.5699 |
186
+ |mmbench_en_dev| | exact_match | 0.7113 |
187
+
188
+ ---
189
+
190
+ ## Ethical Considerations and Limitations
191
+
192
+ This model is an initial prototype, and we have not yet conducted a thorough evaluation of societal and cognitive biases. In future iterations, we plan to assess potential biases using established benchmarks, following methodologies similar to those applied in previous models.
193
+
194
+ We acknowledge that bias evaluation is a critical step in responsible model development. Given the ongoing nature of this work, we strongly encourage developers to conduct safety assessments and bias mitigation strategies tailored to their specific applications of the model. Future updates will include more comprehensive analyses as we continue improving this model.
195
+
196
+ ---
197
+
198
+ ## Additional information
199
+
200
+ ### Author
201
+ The Language Technologies Lab from Barcelona Supercomputing Center.
202
+
203
+ ### Contact
204
+ For further information, please send an email to <[email protected]>.
205
+
206
+ ### Copyright
207
+ Copyright(c) 2025 by Language Technologies Lab, Barcelona Supercomputing Center.
208
+
209
+ ### Funding
210
+ This work has been promoted and financed by the Ministerio para la Transformación Digital y de la Función Pública and Plan de Recuperación, Transformación y Resiliencia - Funded by EU – NextGenerationEU within the framework of the project Modelos del Lenguaje.
211
+
212
+ ### Disclaimer
213
+ Be aware that the model may contain biases or other unintended distortions.
214
+ When third parties deploy systems or provide services based on this model, or use the model themselves,
215
+ they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable regulations,
216
+ including those governing the use of Artificial Intelligence.
217
+
218
+ The Barcelona Supercomputing Center, as the owner and creator of the model, shall not be held liable for any outcomes resulting from third-party use.
219
+
220
+ ### Citation
221
+ ```
222
+ @misc{gonzalezagirre2025salamandratechnicalreport,
223
+ title={Salamandra Technical Report},
224
+ author={Aitor Gonzalez-Agirre and Marc Pàmies and Joan Llop and Irene Baucells and Severino Da Dalt and Daniel Tamayo and José Javier Saiz and Ferran Espuña and Jaume Prats and Javier Aula-Blasco and Mario Mina and Adrián Rubio and Alexander Shvets and Anna Sallés and Iñaki Lacunza and Iñigo Pikabea and Jorge Palomar and Júlia Falcão and Lucía Tormo and Luis Vasquez-Reina and Montserrat Marimon and Valle Ruíz-Fernández and Marta Villegas},
225
+ year={2025},
226
+ eprint={2502.08489},
227
+ archivePrefix={arXiv},
228
+ primaryClass={cs.CL},
229
+ url={https://arxiv.org/abs/2502.08489},
230
+ }
231
+ ```
232
+
233
+ ### License
234
+ [RESEARCH-ONLY RAIL-AMS](https://huggingface.co/BSC-LT/salamandra-7b-vision/blob/main/LICENSE)
235
+
236
+ ## Base Model Index
237
+ |Model|Base|Instruct|
238
+ |:---:|:---:|:---:|
239
+ |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
240
+ |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
241
+ |40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
242
+
243
+ <details>
244
+ <summary>References</summary>
245
+ - Li, B., Zhang, Y., Guo, D., Zhang, R., Li, F., Zhang, H., Zhang, K., Zhang, P., Li, Y., Liu, Z., & Li, C. (2024). Llava-OneVision: Easy Visual Task Transfer. [Link](https://arxiv.org/abs/2408.03326)
246
+ - Liu, H., Li, C., Li, Y., Li, B., Zhang, Y., Shen, S., & Lee, Y. J. (2024). Llava-Next: Improved Reasoning, OCR, and World Knowledge. [Link](https://llava-vl.github.io/blog/2024-01-30-llava-next/)
247
+ - Kembhavi, A., Salvato, M., Kolve, E., Seo, M., Hajishirzi, H., & Farhadi, A. (2016). A Diagram is Worth a Dozen Images. [Link](https://arxiv.org/abs/1603.07396)
248
+ - Tong, S., Brown, E., Wu, P., Woo, S., Middepogu, M., Akula, S. C., Yang, J., Yang, S., Iyer, A., Pan, X., Wang, Z., Fergus, R., LeCun, Y., & Xie, S. (2024). Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. [Link](https://arxiv.org/abs/2406.16860)
249
+ - - Zhai, X., Mustafa, B., Kolesnikov, A., & Beyer, L. (2023). Sigmoid Loss for Language Image Pre-Training. [Link](https://arxiv.org/abs/2303.15343)
250
+ </details>
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n'}}{# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>\n' }}{% endfor %}{# Render all video then #}{% for content in message['content'] | selectattr('type', 'equalto', 'video') %}{{ '<video>\n' }}{% endfor %}{# Render all text next #}{% if message['role'] != 'assistant' %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] }}{% endfor %}{% else %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{% generation %}{{ content['text'] }}{% endgeneration %}{% endfor %}{% endif %}{{'<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
3
+ }
config.json ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlavaOnevisionForConditionalGeneration"
4
+ ],
5
+ "image_grid_pinpoints": [
6
+ [
7
+ 384,
8
+ 384
9
+ ],
10
+ [
11
+ 384,
12
+ 768
13
+ ],
14
+ [
15
+ 384,
16
+ 1152
17
+ ],
18
+ [
19
+ 384,
20
+ 1536
21
+ ],
22
+ [
23
+ 384,
24
+ 1920
25
+ ],
26
+ [
27
+ 384,
28
+ 2304
29
+ ],
30
+ [
31
+ 768,
32
+ 384
33
+ ],
34
+ [
35
+ 768,
36
+ 768
37
+ ],
38
+ [
39
+ 768,
40
+ 1152
41
+ ],
42
+ [
43
+ 768,
44
+ 1536
45
+ ],
46
+ [
47
+ 768,
48
+ 1920
49
+ ],
50
+ [
51
+ 768,
52
+ 2304
53
+ ],
54
+ [
55
+ 1152,
56
+ 384
57
+ ],
58
+ [
59
+ 1152,
60
+ 768
61
+ ],
62
+ [
63
+ 1152,
64
+ 1152
65
+ ],
66
+ [
67
+ 1152,
68
+ 1536
69
+ ],
70
+ [
71
+ 1152,
72
+ 1920
73
+ ],
74
+ [
75
+ 1152,
76
+ 2304
77
+ ],
78
+ [
79
+ 1536,
80
+ 384
81
+ ],
82
+ [
83
+ 1536,
84
+ 768
85
+ ],
86
+ [
87
+ 1536,
88
+ 1152
89
+ ],
90
+ [
91
+ 1536,
92
+ 1536
93
+ ],
94
+ [
95
+ 1536,
96
+ 1920
97
+ ],
98
+ [
99
+ 1536,
100
+ 2304
101
+ ],
102
+ [
103
+ 1920,
104
+ 384
105
+ ],
106
+ [
107
+ 1920,
108
+ 768
109
+ ],
110
+ [
111
+ 1920,
112
+ 1152
113
+ ],
114
+ [
115
+ 1920,
116
+ 1536
117
+ ],
118
+ [
119
+ 1920,
120
+ 1920
121
+ ],
122
+ [
123
+ 1920,
124
+ 2304
125
+ ],
126
+ [
127
+ 2304,
128
+ 384
129
+ ],
130
+ [
131
+ 2304,
132
+ 768
133
+ ],
134
+ [
135
+ 2304,
136
+ 1152
137
+ ],
138
+ [
139
+ 2304,
140
+ 1536
141
+ ],
142
+ [
143
+ 2304,
144
+ 1920
145
+ ],
146
+ [
147
+ 2304,
148
+ 2304
149
+ ]
150
+ ],
151
+ "image_token_index": 7,
152
+ "model_type": "llava_onevision",
153
+ "projector_hidden_act": "gelu",
154
+ "text_config": {
155
+ "_name_or_path": "/gpfs/projects/bsc88/text/models/vision/OneVision/checkpoints/bsc/bsc_7b_hf_llama_instructed",
156
+ "architectures": [
157
+ "LlamaForCausalLM"
158
+ ],
159
+ "attention_bias": false,
160
+ "bos_token_id": 4,
161
+ "eos_token_id": 5,
162
+ "head_dim": 128,
163
+ "intermediate_size": 11008,
164
+ "max_position_embeddings": 8192,
165
+ "mlp_bias": false,
166
+ "model_type": "llama",
167
+ "num_key_value_heads": 8,
168
+ "pretraining_tp": 1,
169
+ "torch_dtype": "bfloat16",
170
+ "use_cache": false,
171
+ "vocab_size": 256000
172
+ },
173
+ "tie_word_embeddings": false,
174
+ "torch_dtype": "float16",
175
+ "transformers_version": "4.46.0.dev0",
176
+ "use_image_newline_parameter": true,
177
+ "video_token_index": 8,
178
+ "vision_aspect_ratio": "anyres_max_9",
179
+ "vision_config": {
180
+ "hidden_size": 1152,
181
+ "image_size": 384,
182
+ "intermediate_size": 4304,
183
+ "model_type": "siglip_vision_model",
184
+ "num_attention_heads": 16,
185
+ "num_hidden_layers": 26,
186
+ "patch_size": 14,
187
+ "vision_use_head": false
188
+ },
189
+ "vision_feature_layer": -1,
190
+ "vision_feature_select_strategy": "full"
191
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 4,
4
+ "eos_token_id": 5,
5
+ "transformers_version": "4.46.0.dev0",
6
+ "use_cache": false
7
+ }
images/salamandra_header.png ADDED

Git LFS Details

  • SHA256: de12bec43f22c0c41b45b84425759d6c9e38ecdf06d58519f048f10fe6e826de
  • Pointer size: 133 Bytes
  • Size of remote file: 11.1 MB
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56d64993f237a98daa8f7a3e5673a774fa096a9f253ea63a160550e681dac083
3
+ size 4972145224
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c29aedb51f05ae3213491e0d33f10403541361798d31aadc3da70e4e5d39669
3
+ size 4962107424
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3e1d8f0764ea4fd79fa7657b1f36b309e5e2c575d208fcff90ba82419de7799
3
+ size 4343437568
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0331b60dd9c98f4918f99cac5dafcd2b3c712ff7ed2f0697e7a2f4393ef387c
3
+ size 2097152144
model.safetensors.index.json ADDED
@@ -0,0 +1,724 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 16374746176
4
+ },
5
+ "weight_map": {
6
+ "image_newline": "model-00001-of-00004.safetensors",
7
+ "language_model.lm_head.weight": "model-00004-of-00004.safetensors",
8
+ "language_model.model.embed_tokens.weight": "model-00001-of-00004.safetensors",
9
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
10
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
11
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
12
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
13
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
14
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
15
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
16
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
17
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
18
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
19
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
20
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
21
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
22
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
23
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
24
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
25
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
26
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
27
+ "language_model.model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
28
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
29
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
30
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
31
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
32
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
33
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
34
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
35
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
36
+ "language_model.model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
37
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
38
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
39
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
40
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
41
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
42
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
43
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
44
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
45
+ "language_model.model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
46
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
47
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
48
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
49
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
50
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
51
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
52
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
53
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
54
+ "language_model.model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
55
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
56
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
57
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
58
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
59
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
60
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
61
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
62
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
63
+ "language_model.model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
64
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
65
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
66
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
67
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
68
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
69
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
70
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
71
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
72
+ "language_model.model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
73
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
74
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
75
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
76
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
77
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
78
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
79
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
80
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
81
+ "language_model.model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
82
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
83
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
84
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
85
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
86
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
87
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
88
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
89
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
90
+ "language_model.model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
91
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
92
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
93
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
94
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
95
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
96
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
97
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
98
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
99
+ "language_model.model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
100
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
101
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
102
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
103
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
104
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
105
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
106
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
107
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
108
+ "language_model.model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
109
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
110
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
111
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
112
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
113
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
114
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
115
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
116
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
117
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
118
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
119
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
120
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
121
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
122
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
123
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
124
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
125
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
126
+ "language_model.model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
127
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
128
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
129
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
130
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
131
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
132
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
133
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
134
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
135
+ "language_model.model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
136
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
137
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
138
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
139
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
140
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
141
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
142
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
143
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
144
+ "language_model.model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
145
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
146
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
147
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
148
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
149
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
150
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
151
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
152
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
153
+ "language_model.model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
154
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
155
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
156
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
157
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
158
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
159
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
160
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
161
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
162
+ "language_model.model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
163
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
164
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
165
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
166
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
167
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
168
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
169
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
170
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
171
+ "language_model.model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
172
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
173
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
174
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
175
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
176
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
177
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
178
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
179
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
180
+ "language_model.model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
181
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
182
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
183
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
184
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
185
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
186
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
187
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
188
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
189
+ "language_model.model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
190
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
191
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
192
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
193
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
194
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
195
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
196
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
197
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
198
+ "language_model.model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
199
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
200
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
201
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
202
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
203
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
204
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
205
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
206
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
207
+ "language_model.model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
208
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
209
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
210
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
211
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
212
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
213
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
214
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
215
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
216
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
217
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
218
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
219
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
220
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
221
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
222
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
223
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
224
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
225
+ "language_model.model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
226
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
227
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
228
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
229
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
230
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
231
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
232
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
233
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
234
+ "language_model.model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
235
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
236
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
237
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
238
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
239
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
240
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
241
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
242
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
243
+ "language_model.model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
244
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
245
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
246
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
247
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
248
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
249
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
250
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
251
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
252
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
253
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
254
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
255
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
256
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
257
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
258
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
259
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
260
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
261
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00004.safetensors",
262
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
263
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
264
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
265
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
266
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
267
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
268
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
269
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
270
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
271
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
272
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
273
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
274
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
275
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
276
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
277
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
278
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
279
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
280
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
281
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
282
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
283
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
284
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
285
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
286
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
287
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
288
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
289
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
290
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
291
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
292
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
293
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
294
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
295
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
296
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
297
+ "language_model.model.norm.weight": "model-00003-of-00004.safetensors",
298
+ "multi_modal_projector.linear_1.bias": "model-00001-of-00004.safetensors",
299
+ "multi_modal_projector.linear_1.weight": "model-00001-of-00004.safetensors",
300
+ "multi_modal_projector.linear_2.bias": "model-00001-of-00004.safetensors",
301
+ "multi_modal_projector.linear_2.weight": "model-00001-of-00004.safetensors",
302
+ "vision_tower.vision_model.embeddings.patch_embedding.bias": "model-00001-of-00004.safetensors",
303
+ "vision_tower.vision_model.embeddings.patch_embedding.weight": "model-00001-of-00004.safetensors",
304
+ "vision_tower.vision_model.embeddings.position_embedding.weight": "model-00001-of-00004.safetensors",
305
+ "vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "model-00001-of-00004.safetensors",
306
+ "vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "model-00001-of-00004.safetensors",
307
+ "vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "model-00001-of-00004.safetensors",
308
+ "vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "model-00001-of-00004.safetensors",
309
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "model-00001-of-00004.safetensors",
310
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "model-00001-of-00004.safetensors",
311
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "model-00001-of-00004.safetensors",
312
+ "vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "model-00001-of-00004.safetensors",
313
+ "vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
314
+ "vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
315
+ "vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
316
+ "vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
317
+ "vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
318
+ "vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
319
+ "vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
320
+ "vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
321
+ "vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "model-00001-of-00004.safetensors",
322
+ "vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "model-00001-of-00004.safetensors",
323
+ "vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "model-00001-of-00004.safetensors",
324
+ "vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "model-00001-of-00004.safetensors",
325
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "model-00001-of-00004.safetensors",
326
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "model-00001-of-00004.safetensors",
327
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "model-00001-of-00004.safetensors",
328
+ "vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "model-00001-of-00004.safetensors",
329
+ "vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
330
+ "vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
331
+ "vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
332
+ "vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
333
+ "vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
334
+ "vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
335
+ "vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
336
+ "vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
337
+ "vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "model-00001-of-00004.safetensors",
338
+ "vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "model-00001-of-00004.safetensors",
339
+ "vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "model-00001-of-00004.safetensors",
340
+ "vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "model-00001-of-00004.safetensors",
341
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "model-00001-of-00004.safetensors",
342
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "model-00001-of-00004.safetensors",
343
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "model-00001-of-00004.safetensors",
344
+ "vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "model-00001-of-00004.safetensors",
345
+ "vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
346
+ "vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
347
+ "vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
348
+ "vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
349
+ "vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
350
+ "vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
351
+ "vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
352
+ "vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
353
+ "vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "model-00001-of-00004.safetensors",
354
+ "vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "model-00001-of-00004.safetensors",
355
+ "vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "model-00001-of-00004.safetensors",
356
+ "vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "model-00001-of-00004.safetensors",
357
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "model-00001-of-00004.safetensors",
358
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "model-00001-of-00004.safetensors",
359
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "model-00001-of-00004.safetensors",
360
+ "vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "model-00001-of-00004.safetensors",
361
+ "vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
362
+ "vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
363
+ "vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
364
+ "vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
365
+ "vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
366
+ "vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
367
+ "vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
368
+ "vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
369
+ "vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "model-00001-of-00004.safetensors",
370
+ "vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "model-00001-of-00004.safetensors",
371
+ "vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "model-00001-of-00004.safetensors",
372
+ "vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "model-00001-of-00004.safetensors",
373
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "model-00001-of-00004.safetensors",
374
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "model-00001-of-00004.safetensors",
375
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "model-00001-of-00004.safetensors",
376
+ "vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "model-00001-of-00004.safetensors",
377
+ "vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
378
+ "vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
379
+ "vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
380
+ "vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
381
+ "vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
382
+ "vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
383
+ "vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
384
+ "vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
385
+ "vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "model-00001-of-00004.safetensors",
386
+ "vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "model-00001-of-00004.safetensors",
387
+ "vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "model-00001-of-00004.safetensors",
388
+ "vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "model-00001-of-00004.safetensors",
389
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "model-00001-of-00004.safetensors",
390
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "model-00001-of-00004.safetensors",
391
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "model-00001-of-00004.safetensors",
392
+ "vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "model-00001-of-00004.safetensors",
393
+ "vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
394
+ "vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
395
+ "vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
396
+ "vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
397
+ "vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
398
+ "vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
399
+ "vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
400
+ "vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
401
+ "vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "model-00001-of-00004.safetensors",
402
+ "vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "model-00001-of-00004.safetensors",
403
+ "vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "model-00001-of-00004.safetensors",
404
+ "vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "model-00001-of-00004.safetensors",
405
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "model-00001-of-00004.safetensors",
406
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "model-00001-of-00004.safetensors",
407
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "model-00001-of-00004.safetensors",
408
+ "vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "model-00001-of-00004.safetensors",
409
+ "vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
410
+ "vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
411
+ "vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
412
+ "vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
413
+ "vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
414
+ "vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
415
+ "vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
416
+ "vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
417
+ "vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "model-00001-of-00004.safetensors",
418
+ "vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "model-00001-of-00004.safetensors",
419
+ "vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "model-00001-of-00004.safetensors",
420
+ "vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "model-00001-of-00004.safetensors",
421
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "model-00001-of-00004.safetensors",
422
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "model-00001-of-00004.safetensors",
423
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "model-00001-of-00004.safetensors",
424
+ "vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "model-00001-of-00004.safetensors",
425
+ "vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
426
+ "vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
427
+ "vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
428
+ "vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
429
+ "vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
430
+ "vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
431
+ "vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
432
+ "vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
433
+ "vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "model-00001-of-00004.safetensors",
434
+ "vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "model-00001-of-00004.safetensors",
435
+ "vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "model-00001-of-00004.safetensors",
436
+ "vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "model-00001-of-00004.safetensors",
437
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "model-00001-of-00004.safetensors",
438
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "model-00001-of-00004.safetensors",
439
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "model-00001-of-00004.safetensors",
440
+ "vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "model-00001-of-00004.safetensors",
441
+ "vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
442
+ "vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
443
+ "vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
444
+ "vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
445
+ "vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
446
+ "vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
447
+ "vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
448
+ "vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
449
+ "vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "model-00001-of-00004.safetensors",
450
+ "vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "model-00001-of-00004.safetensors",
451
+ "vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "model-00001-of-00004.safetensors",
452
+ "vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "model-00001-of-00004.safetensors",
453
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "model-00001-of-00004.safetensors",
454
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "model-00001-of-00004.safetensors",
455
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "model-00001-of-00004.safetensors",
456
+ "vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "model-00001-of-00004.safetensors",
457
+ "vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
458
+ "vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
459
+ "vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
460
+ "vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
461
+ "vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
462
+ "vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
463
+ "vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
464
+ "vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
465
+ "vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "model-00001-of-00004.safetensors",
466
+ "vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "model-00001-of-00004.safetensors",
467
+ "vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "model-00001-of-00004.safetensors",
468
+ "vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "model-00001-of-00004.safetensors",
469
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "model-00001-of-00004.safetensors",
470
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "model-00001-of-00004.safetensors",
471
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "model-00001-of-00004.safetensors",
472
+ "vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "model-00001-of-00004.safetensors",
473
+ "vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
474
+ "vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
475
+ "vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
476
+ "vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
477
+ "vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
478
+ "vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
479
+ "vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
480
+ "vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
481
+ "vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "model-00001-of-00004.safetensors",
482
+ "vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "model-00001-of-00004.safetensors",
483
+ "vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "model-00001-of-00004.safetensors",
484
+ "vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "model-00001-of-00004.safetensors",
485
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "model-00001-of-00004.safetensors",
486
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "model-00001-of-00004.safetensors",
487
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "model-00001-of-00004.safetensors",
488
+ "vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "model-00001-of-00004.safetensors",
489
+ "vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
490
+ "vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
491
+ "vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
492
+ "vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
493
+ "vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
494
+ "vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
495
+ "vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
496
+ "vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
497
+ "vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "model-00001-of-00004.safetensors",
498
+ "vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "model-00001-of-00004.safetensors",
499
+ "vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "model-00001-of-00004.safetensors",
500
+ "vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "model-00001-of-00004.safetensors",
501
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "model-00001-of-00004.safetensors",
502
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "model-00001-of-00004.safetensors",
503
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "model-00001-of-00004.safetensors",
504
+ "vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "model-00001-of-00004.safetensors",
505
+ "vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
506
+ "vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
507
+ "vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
508
+ "vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
509
+ "vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
510
+ "vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
511
+ "vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
512
+ "vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
513
+ "vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "model-00001-of-00004.safetensors",
514
+ "vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "model-00001-of-00004.safetensors",
515
+ "vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "model-00001-of-00004.safetensors",
516
+ "vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "model-00001-of-00004.safetensors",
517
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "model-00001-of-00004.safetensors",
518
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "model-00001-of-00004.safetensors",
519
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "model-00001-of-00004.safetensors",
520
+ "vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "model-00001-of-00004.safetensors",
521
+ "vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
522
+ "vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
523
+ "vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
524
+ "vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
525
+ "vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
526
+ "vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
527
+ "vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
528
+ "vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
529
+ "vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "model-00001-of-00004.safetensors",
530
+ "vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "model-00001-of-00004.safetensors",
531
+ "vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "model-00001-of-00004.safetensors",
532
+ "vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "model-00001-of-00004.safetensors",
533
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "model-00001-of-00004.safetensors",
534
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "model-00001-of-00004.safetensors",
535
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "model-00001-of-00004.safetensors",
536
+ "vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "model-00001-of-00004.safetensors",
537
+ "vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
538
+ "vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
539
+ "vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
540
+ "vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
541
+ "vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
542
+ "vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
543
+ "vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
544
+ "vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
545
+ "vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "model-00001-of-00004.safetensors",
546
+ "vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "model-00001-of-00004.safetensors",
547
+ "vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "model-00001-of-00004.safetensors",
548
+ "vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "model-00001-of-00004.safetensors",
549
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "model-00001-of-00004.safetensors",
550
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "model-00001-of-00004.safetensors",
551
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "model-00001-of-00004.safetensors",
552
+ "vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "model-00001-of-00004.safetensors",
553
+ "vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
554
+ "vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
555
+ "vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
556
+ "vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
557
+ "vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
558
+ "vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
559
+ "vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
560
+ "vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
561
+ "vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "model-00001-of-00004.safetensors",
562
+ "vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "model-00001-of-00004.safetensors",
563
+ "vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "model-00001-of-00004.safetensors",
564
+ "vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "model-00001-of-00004.safetensors",
565
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "model-00001-of-00004.safetensors",
566
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "model-00001-of-00004.safetensors",
567
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "model-00001-of-00004.safetensors",
568
+ "vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "model-00001-of-00004.safetensors",
569
+ "vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
570
+ "vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
571
+ "vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
572
+ "vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
573
+ "vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
574
+ "vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
575
+ "vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
576
+ "vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
577
+ "vision_tower.vision_model.encoder.layers.24.layer_norm1.bias": "model-00001-of-00004.safetensors",
578
+ "vision_tower.vision_model.encoder.layers.24.layer_norm1.weight": "model-00001-of-00004.safetensors",
579
+ "vision_tower.vision_model.encoder.layers.24.layer_norm2.bias": "model-00001-of-00004.safetensors",
580
+ "vision_tower.vision_model.encoder.layers.24.layer_norm2.weight": "model-00001-of-00004.safetensors",
581
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias": "model-00001-of-00004.safetensors",
582
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight": "model-00001-of-00004.safetensors",
583
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias": "model-00001-of-00004.safetensors",
584
+ "vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight": "model-00001-of-00004.safetensors",
585
+ "vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
586
+ "vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
587
+ "vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
588
+ "vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
589
+ "vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
590
+ "vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
591
+ "vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
592
+ "vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
593
+ "vision_tower.vision_model.encoder.layers.25.layer_norm1.bias": "model-00001-of-00004.safetensors",
594
+ "vision_tower.vision_model.encoder.layers.25.layer_norm1.weight": "model-00001-of-00004.safetensors",
595
+ "vision_tower.vision_model.encoder.layers.25.layer_norm2.bias": "model-00001-of-00004.safetensors",
596
+ "vision_tower.vision_model.encoder.layers.25.layer_norm2.weight": "model-00001-of-00004.safetensors",
597
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias": "model-00001-of-00004.safetensors",
598
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight": "model-00001-of-00004.safetensors",
599
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias": "model-00001-of-00004.safetensors",
600
+ "vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight": "model-00001-of-00004.safetensors",
601
+ "vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
602
+ "vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
603
+ "vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
604
+ "vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
605
+ "vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
606
+ "vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
607
+ "vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
608
+ "vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
609
+ "vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "model-00001-of-00004.safetensors",
610
+ "vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "model-00001-of-00004.safetensors",
611
+ "vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "model-00001-of-00004.safetensors",
612
+ "vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "model-00001-of-00004.safetensors",
613
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "model-00001-of-00004.safetensors",
614
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "model-00001-of-00004.safetensors",
615
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "model-00001-of-00004.safetensors",
616
+ "vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "model-00001-of-00004.safetensors",
617
+ "vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
618
+ "vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
619
+ "vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
620
+ "vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
621
+ "vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
622
+ "vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
623
+ "vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
624
+ "vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
625
+ "vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "model-00001-of-00004.safetensors",
626
+ "vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "model-00001-of-00004.safetensors",
627
+ "vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "model-00001-of-00004.safetensors",
628
+ "vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "model-00001-of-00004.safetensors",
629
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "model-00001-of-00004.safetensors",
630
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "model-00001-of-00004.safetensors",
631
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "model-00001-of-00004.safetensors",
632
+ "vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "model-00001-of-00004.safetensors",
633
+ "vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
634
+ "vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
635
+ "vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
636
+ "vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
637
+ "vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
638
+ "vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
639
+ "vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
640
+ "vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
641
+ "vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "model-00001-of-00004.safetensors",
642
+ "vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "model-00001-of-00004.safetensors",
643
+ "vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "model-00001-of-00004.safetensors",
644
+ "vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "model-00001-of-00004.safetensors",
645
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "model-00001-of-00004.safetensors",
646
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "model-00001-of-00004.safetensors",
647
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "model-00001-of-00004.safetensors",
648
+ "vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "model-00001-of-00004.safetensors",
649
+ "vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
650
+ "vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
651
+ "vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
652
+ "vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
653
+ "vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
654
+ "vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
655
+ "vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
656
+ "vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
657
+ "vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "model-00001-of-00004.safetensors",
658
+ "vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "model-00001-of-00004.safetensors",
659
+ "vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "model-00001-of-00004.safetensors",
660
+ "vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "model-00001-of-00004.safetensors",
661
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "model-00001-of-00004.safetensors",
662
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "model-00001-of-00004.safetensors",
663
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "model-00001-of-00004.safetensors",
664
+ "vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "model-00001-of-00004.safetensors",
665
+ "vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
666
+ "vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
667
+ "vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
668
+ "vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
669
+ "vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
670
+ "vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
671
+ "vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
672
+ "vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
673
+ "vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "model-00001-of-00004.safetensors",
674
+ "vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "model-00001-of-00004.safetensors",
675
+ "vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "model-00001-of-00004.safetensors",
676
+ "vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "model-00001-of-00004.safetensors",
677
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "model-00001-of-00004.safetensors",
678
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "model-00001-of-00004.safetensors",
679
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "model-00001-of-00004.safetensors",
680
+ "vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "model-00001-of-00004.safetensors",
681
+ "vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
682
+ "vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
683
+ "vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
684
+ "vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
685
+ "vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
686
+ "vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
687
+ "vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
688
+ "vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
689
+ "vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "model-00001-of-00004.safetensors",
690
+ "vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "model-00001-of-00004.safetensors",
691
+ "vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "model-00001-of-00004.safetensors",
692
+ "vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "model-00001-of-00004.safetensors",
693
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "model-00001-of-00004.safetensors",
694
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "model-00001-of-00004.safetensors",
695
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "model-00001-of-00004.safetensors",
696
+ "vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "model-00001-of-00004.safetensors",
697
+ "vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
698
+ "vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
699
+ "vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
700
+ "vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
701
+ "vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
702
+ "vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
703
+ "vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
704
+ "vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
705
+ "vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "model-00001-of-00004.safetensors",
706
+ "vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "model-00001-of-00004.safetensors",
707
+ "vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "model-00001-of-00004.safetensors",
708
+ "vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "model-00001-of-00004.safetensors",
709
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "model-00001-of-00004.safetensors",
710
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "model-00001-of-00004.safetensors",
711
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "model-00001-of-00004.safetensors",
712
+ "vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "model-00001-of-00004.safetensors",
713
+ "vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
714
+ "vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
715
+ "vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "model-00001-of-00004.safetensors",
716
+ "vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "model-00001-of-00004.safetensors",
717
+ "vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
718
+ "vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
719
+ "vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
720
+ "vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
721
+ "vision_tower.vision_model.post_layernorm.bias": "model-00001-of-00004.safetensors",
722
+ "vision_tower.vision_model.post_layernorm.weight": "model-00001-of-00004.safetensors"
723
+ }
724
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_pad": true,
5
+ "do_rescale": true,
6
+ "do_resize": true,
7
+ "image_grid_pinpoints": [
8
+ [
9
+ 384,
10
+ 384
11
+ ],
12
+ [
13
+ 384,
14
+ 768
15
+ ],
16
+ [
17
+ 384,
18
+ 1152
19
+ ],
20
+ [
21
+ 384,
22
+ 1536
23
+ ],
24
+ [
25
+ 384,
26
+ 1920
27
+ ],
28
+ [
29
+ 384,
30
+ 2304
31
+ ],
32
+ [
33
+ 768,
34
+ 384
35
+ ],
36
+ [
37
+ 768,
38
+ 768
39
+ ],
40
+ [
41
+ 768,
42
+ 1152
43
+ ],
44
+ [
45
+ 768,
46
+ 1536
47
+ ],
48
+ [
49
+ 768,
50
+ 1920
51
+ ],
52
+ [
53
+ 768,
54
+ 2304
55
+ ],
56
+ [
57
+ 1152,
58
+ 384
59
+ ],
60
+ [
61
+ 1152,
62
+ 768
63
+ ],
64
+ [
65
+ 1152,
66
+ 1152
67
+ ],
68
+ [
69
+ 1152,
70
+ 1536
71
+ ],
72
+ [
73
+ 1152,
74
+ 1920
75
+ ],
76
+ [
77
+ 1152,
78
+ 2304
79
+ ],
80
+ [
81
+ 1536,
82
+ 384
83
+ ],
84
+ [
85
+ 1536,
86
+ 768
87
+ ],
88
+ [
89
+ 1536,
90
+ 1152
91
+ ],
92
+ [
93
+ 1536,
94
+ 1536
95
+ ],
96
+ [
97
+ 1536,
98
+ 1920
99
+ ],
100
+ [
101
+ 1536,
102
+ 2304
103
+ ],
104
+ [
105
+ 1920,
106
+ 384
107
+ ],
108
+ [
109
+ 1920,
110
+ 768
111
+ ],
112
+ [
113
+ 1920,
114
+ 1152
115
+ ],
116
+ [
117
+ 1920,
118
+ 1536
119
+ ],
120
+ [
121
+ 1920,
122
+ 1920
123
+ ],
124
+ [
125
+ 1920,
126
+ 2304
127
+ ],
128
+ [
129
+ 2304,
130
+ 384
131
+ ],
132
+ [
133
+ 2304,
134
+ 768
135
+ ],
136
+ [
137
+ 2304,
138
+ 1152
139
+ ],
140
+ [
141
+ 2304,
142
+ 1536
143
+ ],
144
+ [
145
+ 2304,
146
+ 1920
147
+ ],
148
+ [
149
+ 2304,
150
+ 2304
151
+ ]
152
+ ],
153
+ "image_mean": [
154
+ 0.5,
155
+ 0.5,
156
+ 0.5
157
+ ],
158
+ "image_processor_type": "LlavaOnevisionImageProcessor",
159
+ "image_std": [
160
+ 0.5,
161
+ 0.5,
162
+ 0.5
163
+ ],
164
+ "processor_class": "LlavaOnevisionProcessor",
165
+ "resample": 3,
166
+ "rescale_factor": 0.00392156862745098,
167
+ "size": {
168
+ "height": 384,
169
+ "width": 384
170
+ }
171
+ }
processor_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_token": "<image>",
3
+ "num_image_tokens": 729,
4
+ "processor_class": "LlavaOnevisionProcessor",
5
+ "video_token": "<video>",
6
+ "vision_feature_select_strategy": "full"
7
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "▁Salamandra",
6
+ "<image>",
7
+ "<video>"
8
+ ],
9
+ "bos_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cb95c9086ddffe8c6c6b0b65302c7e27d46125775281cf0918f39dbae4ff355
3
+ size 37007546
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3242ecfbd7249f016348d75d9fe0656029643ae971533425091310510de48db
3
+ size 4813241
video_processor/preprocessor_config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.5,
8
+ 0.5,
9
+ 0.5
10
+ ],
11
+ "image_processor_type": "LlavaOnevisionVideoProcessor",
12
+ "image_std": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "processor_class": "SiglipProcessor",
18
+ "resample": 3,
19
+ "rescale_factor": 0.00392156862745098,
20
+ "size": {
21
+ "height": 384,
22
+ "width": 384
23
+ }
24
+ }