FastFlowLM commited on
Commit
ef76ff9
·
verified ·
1 Parent(s): 166489c

Init upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ attn.xclbin filter=lfs diff=lfs merge=lfs -text
37
+ dequant.xclbin filter=lfs diff=lfs merge=lfs -text
38
+ layer.xclbin filter=lfs diff=lfs merge=lfs -text
39
+ lm_head.xclbin filter=lfs diff=lfs merge=lfs -text
40
+ mm.xclbin filter=lfs diff=lfs merge=lfs -text
41
+ model.q4nx filter=lfs diff=lfs merge=lfs -text
42
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
43
+ vision_attn.xclbin filter=lfs diff=lfs merge=lfs -text
44
+ vision_mm.xclbin filter=lfs diff=lfs merge=lfs -text
45
+ vision_weight.q4nx filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,752 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: health-ai-developer-foundations
4
+ license_link: https://developers.google.com/health-ai-developer-foundations/terms
5
+ library_name: transformers
6
+ pipeline_tag: image-text-to-text
7
+ extra_gated_heading: Access MedGemma on Hugging Face
8
+ extra_gated_prompt: >-
9
+ To access MedGemma on Hugging Face, you're required to review and
10
+ agree to [Health AI Developer Foundation's terms of use](https://developers.google.com/health-ai-developer-foundations/terms).
11
+ To do this, please ensure you're logged in to Hugging Face and click below.
12
+ Requests are processed immediately.
13
+ extra_gated_button_content: Acknowledge license
14
+ base_model: google/medgemma-4b-pt
15
+ tags:
16
+ - medical
17
+ - radiology
18
+ - clinical-reasoning
19
+ - dermatology
20
+ - pathology
21
+ - ophthalmology
22
+ - chest-x-ray
23
+ ---
24
+
25
+ # MedGemma model card
26
+
27
+ **Model documentation:** [MedGemma](https://developers.google.com/health-ai-developer-foundations/medgemma)
28
+
29
+ **Resources:**
30
+
31
+ * Model on Google Cloud Model Garden: [MedGemma](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/medgemma)
32
+ * Model on Hugging Face: [MedGemma](https://huggingface.co/collections/google/medgemma-release-680aade845f90bec6a3f60c4)
33
+ * GitHub repository (supporting code, Colab notebooks, discussions, and
34
+ issues): [MedGemma](https://github.com/google-health/medgemma)
35
+ * Quick start notebook: [GitHub](https://github.com/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb)
36
+ * Fine-tuning notebook: [GitHub](https://github.com/google-health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb)
37
+ * Concept applications built using MedGemma: [Collection](https://huggingface.co/collections/google/medgemma-concept-apps-686ea036adb6d51416b0928a)
38
+ * Support: See [Contact](https://developers.google.com/health-ai-developer-foundations/medgemma/get-started.md#contact)
39
+ * License: The use of MedGemma is governed by the [Health AI Developer
40
+ Foundations terms of
41
+ use](https://developers.google.com/health-ai-developer-foundations/terms).
42
+
43
+ **Author:** Google
44
+
45
+ ## Model information
46
+
47
+ This section describes the MedGemma model and how to use it.
48
+
49
+ ### Description
50
+
51
+ MedGemma is a collection of [Gemma 3](https://ai.google.dev/gemma/docs/core)
52
+ variants that are trained for performance on medical text and image
53
+ comprehension. Developers can use MedGemma to accelerate building
54
+ healthcare-based AI applications. MedGemma currently comes in three variants: a
55
+ 4B multimodal version and 27B text-only and multimodal versions.
56
+
57
+ Both MedGemma multimodal versions utilize a
58
+ [SigLIP](https://arxiv.org/abs/2303.15343) image encoder that has been
59
+ specifically pre-trained on a variety of de-identified medical data, including
60
+ chest X-rays, dermatology images, ophthalmology images, and histopathology
61
+ slides. Their LLM components are trained on a diverse set of medical data,
62
+ including medical text, medical question-answer pairs, FHIR-based electronic
63
+ health record data (27B multimodal only), radiology images, histopathology
64
+ patches, ophthalmology images, and dermatology images.
65
+
66
+ MedGemma 4B is available in both pre-trained (suffix: `-pt`) and
67
+ instruction-tuned (suffix `-it`) versions. The instruction-tuned version is a
68
+ better starting point for most applications. The pre-trained version is
69
+ available for those who want to experiment more deeply with the models.
70
+
71
+ MedGemma 27B multimodal has pre-training on medical image, medical record and
72
+ medical record comprehension tasks. MedGemma 27B text-only has been trained
73
+ exclusively on medical text. Both models have been optimized for inference-time
74
+ computation on medical reasoning. This means it has slightly higher performance
75
+ on some text benchmarks than MedGemma 27B multimodal. Users who want to work
76
+ with a single model for both medical text, medical record and medical image
77
+ tasks are better suited for MedGemma 27B multimodal. Those that only need text
78
+ use-cases may be better served with the text-only variant. Both MedGemma 27B
79
+ variants are only available in instruction-tuned versions.
80
+
81
+ MedGemma variants have been evaluated on a range of clinically relevant
82
+ benchmarks to illustrate their baseline performance. These evaluations are based
83
+ on both open benchmark datasets and curated datasets. Developers can fine-tune
84
+ MedGemma variants for improved performance. Consult the [Intended
85
+ Use](https://developers.google.com/health-ai-developer-foundations/medgemma/model-card#intended_use)
86
+ section below for more details.
87
+
88
+ MedGemma is optimized for medical applications that involve a text generation
89
+ component. For medical image-based applications that do not involve text
90
+ generation, such as data-efficient classification, zero-shot classification, or
91
+ content-based or semantic image retrieval, the [MedSigLIP image
92
+ encoder](https://developers.google.com/health-ai-developer-foundations/medsiglip/model-card)
93
+ is recommended. MedSigLIP is based on the same image encoder that powers
94
+ MedGemma.
95
+
96
+ Please consult the [MedGemma Technical Report](https://arxiv.org/abs/2507.05201)
97
+ for more details.
98
+
99
+ ### How to use
100
+
101
+ Below are some example code snippets to help you quickly get started running the
102
+ model locally on GPU. If you want to use the model at scale, we recommend that
103
+ you create a production version using [Model
104
+ Garden](https://cloud.google.com/model-garden).
105
+
106
+ First, install the Transformers library. Gemma 3 is supported starting from
107
+ transformers 4.50.0.
108
+
109
+ ```sh
110
+ $ pip install -U transformers
111
+ ```
112
+
113
+ **Run model with the `pipeline` API**
114
+
115
+ ```python
116
+ from transformers import pipeline
117
+ from PIL import Image
118
+ import requests
119
+ import torch
120
+
121
+ pipe = pipeline(
122
+ "image-text-to-text",
123
+ model="google/medgemma-4b-it",
124
+ torch_dtype=torch.bfloat16,
125
+ device="cuda",
126
+ )
127
+
128
+ # Image attribution: Stillwaterising, CC0, via Wikimedia Commons
129
+ image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
130
+ image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
131
+
132
+ messages = [
133
+ {
134
+ "role": "system",
135
+ "content": [{"type": "text", "text": "You are an expert radiologist."}]
136
+ },
137
+ {
138
+ "role": "user",
139
+ "content": [
140
+ {"type": "text", "text": "Describe this X-ray"},
141
+ {"type": "image", "image": image}
142
+ ]
143
+ }
144
+ ]
145
+
146
+ output = pipe(text=messages, max_new_tokens=200)
147
+ print(output[0]["generated_text"][-1]["content"])
148
+ ```
149
+
150
+ **Run the model directly**
151
+
152
+ ```python
153
+ # pip install accelerate
154
+ from transformers import AutoProcessor, AutoModelForImageTextToText
155
+ from PIL import Image
156
+ import requests
157
+ import torch
158
+
159
+ model_id = "google/medgemma-4b-it"
160
+
161
+ model = AutoModelForImageTextToText.from_pretrained(
162
+ model_id,
163
+ torch_dtype=torch.bfloat16,
164
+ device_map="auto",
165
+ )
166
+ processor = AutoProcessor.from_pretrained(model_id)
167
+
168
+ # Image attribution: Stillwaterising, CC0, via Wikimedia Commons
169
+ image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
170
+ image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
171
+
172
+ messages = [
173
+ {
174
+ "role": "system",
175
+ "content": [{"type": "text", "text": "You are an expert radiologist."}]
176
+ },
177
+ {
178
+ "role": "user",
179
+ "content": [
180
+ {"type": "text", "text": "Describe this X-ray"},
181
+ {"type": "image", "image": image}
182
+ ]
183
+ }
184
+ ]
185
+
186
+ inputs = processor.apply_chat_template(
187
+ messages, add_generation_prompt=True, tokenize=True,
188
+ return_dict=True, return_tensors="pt"
189
+ ).to(model.device, dtype=torch.bfloat16)
190
+
191
+ input_len = inputs["input_ids"].shape[-1]
192
+
193
+ with torch.inference_mode():
194
+ generation = model.generate(**inputs, max_new_tokens=200, do_sample=False)
195
+ generation = generation[0][input_len:]
196
+
197
+ decoded = processor.decode(generation, skip_special_tokens=True)
198
+ print(decoded)
199
+ ```
200
+
201
+ ### Examples
202
+
203
+ See the following Colab notebooks for examples of how to use MedGemma:
204
+
205
+ * To give the model a quick try, running it locally with weights from Hugging
206
+ Face, see [Quick start notebook in
207
+ Colab](https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/quick_start_with_hugging_face.ipynb).
208
+ Note that you will need to use Colab Enterprise to obtain adequate GPU
209
+ resources to run either 27B model without quantization.
210
+
211
+ * For an example of fine-tuning the 4B model, see the [Fine-tuning notebook in
212
+ Colab](https://colab.research.google.com/github/google-health/medgemma/blob/main/notebooks/fine_tune_with_hugging_face.ipynb).
213
+ The 27B models can be fine tuned in a similar manner but will require more
214
+ time and compute resources than the 4B model.
215
+
216
+ ### Model architecture overview
217
+
218
+ The MedGemma model is built based on [Gemma 3](https://ai.google.dev/gemma/) and
219
+ uses the same decoder-only transformer architecture as Gemma 3\. To read more
220
+ about the architecture, consult the Gemma 3 [model
221
+ card](https://ai.google.dev/gemma/docs/core/model_card_3).
222
+
223
+ ### Technical specifications
224
+
225
+ * **Model type**: Decoder-only Transformer architecture, see the [Gemma 3
226
+ Technical
227
+ Report](https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf)
228
+ * **Input Modalities**: Text, vision
229
+ * **Output Modality:** Text only
230
+ * **Attention mechanism**: Grouped-query attention (GQA)
231
+ * **Context length**: Supports long context, at least 128K tokens
232
+ * **Key publication**: https://arxiv.org/abs/2507.05201
233
+ * **Model created**: July 9, 2025
234
+
235
+ * **Model version**: 1.0.1
236
+
237
+ ### Citation
238
+
239
+ When using this model, please cite: Sellergren et al. "MedGemma Technical
240
+ Report." *arXiv preprint arXiv:2507.05201* (2025).
241
+
242
+ ```none
243
+ @article{sellergren2025medgemma,
244
+ title={MedGemma Technical Report},
245
+ author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
246
+ journal={arXiv preprint arXiv:2507.05201},
247
+ year={2025}
248
+ }
249
+ ```
250
+
251
+ ### Inputs and outputs
252
+
253
+ **Input**:
254
+
255
+ * Text string, such as a question or prompt
256
+ * Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
257
+ * Total input length of 128K tokens
258
+
259
+ **Output**:
260
+
261
+ * Generated text in response to the input, such as an answer to a question,
262
+ analysis of image content, or a summary of a document
263
+ * Total output length of 8192 tokens
264
+
265
+ ### Performance and validation
266
+
267
+ MedGemma was evaluated across a range of different multimodal classification,
268
+ report generation, visual question answering, and text-based tasks.
269
+
270
+ ### Key performance metrics
271
+
272
+ #### Imaging evaluations
273
+
274
+ The multimodal performance of MedGemma 4B and 27B multimodal was evaluated
275
+ across a range of benchmarks, focusing on radiology, dermatology,
276
+ histopathology, ophthalmology, and multimodal clinical reasoning.
277
+
278
+ MedGemma 4B outperforms the base Gemma 3 4B model across all tested multimodal
279
+ health benchmarks.
280
+
281
+ | Task and metric | Gemma 3 4B | MedGemma 4B |
282
+ | :---- | :---- | :---- |
283
+ | **Medical image classification** | | |
284
+ | MIMIC CXR\*\* \- macro F1 for top 5 conditions | 81.2 | 88.9 |
285
+ | CheXpert CXR \- macro F1 for top 5 conditions | 32.6 | 48.1 |
286
+ | CXR14 \- macro F1 for 3 conditions | 32.0 | 50.1 |
287
+ | PathMCQA\* (histopathology, internal\*\*) \- Accuracy | 37.1 | 69.8 |
288
+ | US-DermMCQA\* \- Accuracy | 52.5 | 71.8 |
289
+ | EyePACS\* (fundus, internal) \- Accuracy | 14.4 | 64.9 |
290
+ | **Visual question answering** | | |
291
+ | SLAKE (radiology) \- Tokenized F1 | 40.2 | 72.3 |
292
+ | VQA-RAD\*\*\* (radiology) \- Tokenized F1 | 33.6 | 49.9 |
293
+ | **Knowledge and reasoning** | | | | |
294
+ | MedXpertQA (text \+ multimodal questions) \- Accuracy | 16.4 | 18.8 |
295
+
296
+ *Internal datasets. US-DermMCQA is described in [Liu (2020, Nature
297
+ medicine)](https://www.nature.com/articles/s41591-020-0842-3), presented as a
298
+ 4-way MCQ per example for skin condition classification. PathMCQA is based on
299
+ multiple datasets, presented as 3-9 way MCQ per example for identification,
300
+ grading, and subtype for breast, cervical, and prostate cancer. EyePACS is a
301
+ dataset of fundus images with classification labels based on 5-level diabetic
302
+ retinopathy severity (None, Mild, Moderate, Severe, Proliferative). More details
303
+ in the [MedGemma Technical Report](https://arxiv.org/abs/2507.05201).
304
+
305
+ **Based on radiologist adjudicated labels, described in [Yang (2024,
306
+ arXiv)](https://arxiv.org/pdf/2405.03162) Section A.1.1.
307
+
308
+ ***Based on "balanced split," described in [Yang (2024,
309
+ arXiv)](https://arxiv.org/pdf/2405.03162).
310
+
311
+ #### Chest X-ray report generation
312
+
313
+ MedGemma chest X-ray (CXR) report generation performance was evaluated on
314
+ [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.1.0/) using the [RadGraph
315
+ F1 metric](https://arxiv.org/abs/2106.14463). We compare the MedGemma
316
+ pre-trained checkpoint with our previous best model for CXR report generation,
317
+ [PaliGemma 2](https://arxiv.org/abs/2412.03555).
318
+
319
+ | Metric | MedGemma 4B (pre-trained) | MedGemma 4B (tuned for CXR)| PaliGemma 2 3B (tuned for CXR) | PaliGemma 2 10B (tuned for CXR) |
320
+ | :---- | :---- | :---- | :---- | :---- |
321
+ | MIMIC CXR \- RadGraph F1 | 29.5 | 30.3 |28.8 | 29.5 |
322
+
323
+
324
+
325
+ The instruction-tuned versions of MedGemma 4B and MedGemma 27B achieve lower
326
+ scores (21.9 and 21.3, respectively) due to the differences in reporting style
327
+ compared to the MIMIC ground truth reports. Further fine-tuning on MIMIC reports
328
+ enables users to achieve improved performance, as shown by the improved
329
+ performance of the MedGemma 4B model that was tuned for CXR.
330
+
331
+ #### Text evaluations
332
+
333
+ MedGemma 4B and text-only MedGemma 27B were evaluated across a range of
334
+ text-only benchmarks for medical knowledge and reasoning.
335
+
336
+ The MedGemma models outperform their respective base Gemma models across all
337
+ tested text-only health benchmarks.
338
+
339
+ | Metric | Gemma 3 4B | MedGemma 4B |
340
+ | :---- | :---- | :---- |
341
+ | MedQA (4-op) | 50.7 | 64.4 |
342
+ | MedMCQA | 45.4 | 55.7 |
343
+ | PubMedQA | 68.4 | 73.4 |
344
+ | MMLU Med | 67.2 | 70.0 |
345
+ | MedXpertQA (text only) | 11.6 | 14.2 |
346
+ | AfriMed-QA (25 question test set) | 48.0 | 52.0 |
347
+
348
+ For all MedGemma 27B results, [test-time
349
+ scaling](https://arxiv.org/abs/2501.19393) is used to improve performance.
350
+
351
+ #### Medical record evaluations
352
+
353
+ All models were evaluated on a question answer dataset from synthetic FHIR data
354
+ to answer questions about patient records. MedGemma 27B multimodal's
355
+ FHIR-specific training gives it significant improvement over other MedGemma and
356
+ Gemma models.
357
+
358
+ | Metric | Gemma 3 4B | MedGemma 4B |
359
+ | :---- | :---- | :---- |
360
+ | EHRQA | 70.9 | 67.6 |
361
+
362
+
363
+ ### Ethics and safety evaluation
364
+
365
+ #### Evaluation approach
366
+
367
+ Our evaluation methods include structured evaluations and internal red-teaming
368
+ testing of relevant content policies. Red-teaming was conducted by a number of
369
+ different teams, each with different goals and human evaluation metrics. These
370
+ models were evaluated against a number of different categories relevant to
371
+ ethics and safety, including:
372
+
373
+ * **Child safety**: Evaluation of text-to-text and image-to-text prompts
374
+ covering child safety policies, including child sexual abuse and
375
+ exploitation.
376
+ * **Content safety:** Evaluation of text-to-text and image-to-text prompts
377
+ covering safety policies, including harassment, violence and gore, and hate
378
+ speech.
379
+ * **Representational harms**: Evaluation of text-to-text and image-to-text
380
+ prompts covering safety policies, including bias, stereotyping, and harmful
381
+ associations or inaccuracies.
382
+ * **General medical harms:** Evaluation of text-to-text and image-to-text
383
+ prompts covering safety policies, including information quality and harmful
384
+ associations or inaccuracies.
385
+
386
+ In addition to development level evaluations, we conduct "assurance evaluations"
387
+ which are our "arms-length" internal evaluations for responsibility governance
388
+ decision making. They are conducted separately from the model development team,
389
+ to inform decision making about release. High-level findings are fed back to the
390
+ model team, but prompt sets are held out to prevent overfitting and preserve the
391
+ results' ability to inform decision making. Notable assurance evaluation results
392
+ are reported to our Responsibility & Safety Council as part of release review.
393
+
394
+ #### Evaluation results
395
+
396
+ For all areas of safety testing, we saw safe levels of performance across the
397
+ categories of child safety, content safety, and representational harms. All
398
+ testing was conducted without safety filters to evaluate the model capabilities
399
+ and behaviors. For text-to-text, image-to-text, and audio-to-text, and across
400
+ both MedGemma model sizes, the model produced minimal policy violations. A
401
+ limitation of our evaluations was that they included primarily English language
402
+ prompts.
403
+
404
+ ## Data card
405
+
406
+ ### Dataset overview
407
+
408
+ #### Training
409
+
410
+ The base Gemma models are pre-trained on a large corpus of text and code data.
411
+ MedGemma 4B utilizes a [SigLIP](https://arxiv.org/abs/2303.15343) image encoder
412
+ that has been specifically pre-trained on a variety of de-identified medical
413
+ data, including radiology images, histopathology images, ophthalmology images,
414
+ and dermatology images. Its LLM component is trained on a diverse set of medical
415
+ data, including medical text relevant to radiology images, chest-x rays,
416
+ histopathology patches, ophthalmology images and dermatology images.
417
+
418
+ #### Evaluation
419
+
420
+ MedGemma models have been evaluated on a comprehensive set of clinically
421
+ relevant benchmarks, including over 22 datasets across 5 different tasks and 6
422
+ medical image modalities. These include both open benchmark datasets and curated
423
+ datasets, with a focus on expert human evaluations for tasks like CXR report
424
+ generation and radiology VQA.
425
+
426
+ ### Ethics and safety evaluation
427
+
428
+ #### Evaluation approach
429
+
430
+ Our evaluation methods include structured evaluations and internal red-teaming
431
+ testing of relevant content policies. Red-teaming was conducted by a number of
432
+ different teams, each with different goals and human evaluation metrics. These
433
+ models were evaluated against a number of different categories relevant to
434
+ ethics and safety, including:
435
+
436
+ * **Child safety**: Evaluation of text-to-text and image-to-text prompts
437
+ covering child safety policies, including child sexual abuse and
438
+ exploitation.
439
+ * **Content safety:** Evaluation of text-to-text and image-to-text prompts
440
+ covering safety policies, including harassment, violence and gore, and hate
441
+ speech.
442
+ * **Representational harms**: Evaluation of text-to-text and image-to-text
443
+ prompts covering safety policies, including bias, stereotyping, and harmful
444
+ associations or inaccuracies.
445
+ * **General medical harms:** Evaluation of text-to-text and image-to-text
446
+ prompts covering safety policies, including information quality and harmful
447
+ associations or inaccuracies.
448
+
449
+ In addition to development level evaluations, we conduct "assurance evaluations"
450
+ which are our "arms-length" internal evaluations for responsibility governance
451
+ decision making. They are conducted separately from the model development team,
452
+ to inform decision making about release. High-level findings are fed back to the
453
+ model team, but prompt sets are held out to prevent overfitting and preserve the
454
+ results' ability to inform decision making. Notable assurance evaluation results
455
+ are reported to our Responsibility & Safety Council as part of release review.
456
+
457
+ #### Evaluation results
458
+
459
+ For all areas of safety testing, we saw safe levels of performance across the
460
+ categories of child safety, content safety, and representational harms. All
461
+ testing was conducted without safety filters to evaluate the model capabilities
462
+ and behaviors. For text-to-text, image-to-text, and audio-to-text, and across
463
+ both MedGemma model sizes, the model produced minimal policy violations. A
464
+ limitation of our evaluations was that they included primarily English language
465
+ prompts.
466
+
467
+ ## Data card
468
+
469
+ ### Dataset overview
470
+
471
+ #### Training
472
+
473
+ The base Gemma models are pre-trained on a large corpus of text and code data.
474
+ MedGemma multimodal variants utilize a
475
+ [SigLIP](https://arxiv.org/abs/2303.15343) image encoder that has been
476
+ specifically pre-trained on a variety of de-identified medical data, including
477
+ radiology images, histopathology images, ophthalmology images, and dermatology
478
+ images. Their LLM component is trained on a diverse set of medical data,
479
+ including medical text, medical question-answer pairs, FHIR-based electronic
480
+ health record data (27B multimodal only), radiology images, histopathology
481
+ patches, ophthalmology images, and dermatology images.
482
+
483
+ #### Evaluation
484
+
485
+ MedGemma models have been evaluated on a comprehensive set of clinically
486
+ relevant benchmarks, including over 22 datasets across 6 different tasks and 4
487
+ medical image modalities. These benchmarks include both open and internal
488
+ datasets.
489
+
490
+ #### Source
491
+
492
+ MedGemma utilizes a combination of public and private datasets.
493
+
494
+ This model was trained on diverse public datasets including MIMIC-CXR (chest
495
+ X-rays and reports), ChestImaGenome: Set of bounding boxes linking image
496
+ findings with anatomical regions for MIMIC-CXR (MedGemma 27B multimodal only),
497
+ SLAKE (multimodal medical images and questions), PAD-UFES-20 (skin lesion images
498
+ and data), SCIN (dermatology images), TCGA (cancer genomics data), CAMELYON
499
+ (lymph node histopathology images), PMC-OA (biomedical literature with images),
500
+ and Mendeley Digital Knee X-Ray (knee X-rays).
501
+
502
+ Additionally, multiple diverse proprietary datasets were licensed and
503
+ incorporated (described next).
504
+
505
+ ### Data Ownership and Documentation
506
+
507
+ * [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.1.0/): MIT Laboratory
508
+ for Computational Physiology and Beth Israel Deaconess Medical Center
509
+ (BIDMC).
510
+ * [Slake-VQA](https://www.med-vqa.com/slake/): The Hong Kong Polytechnic
511
+ University (PolyU), with collaborators including West China Hospital of
512
+ Sichuan University and Sichuan Academy of Medical Sciences / Sichuan
513
+ Provincial People's Hospital.
514
+ * [PAD-UFES-20](https://pmc.ncbi.nlm.nih.gov/articles/PMC7479321/): Federal
515
+ University of Espírito Santo (UFES), Brazil, through its Dermatological and
516
+ Surgical Assistance Program (PAD).
517
+ * [SCIN](https://github.com/google-research-datasets/scin): A collaboration
518
+ between Google Health and Stanford Medicine.
519
+ * [TCGA](https://portal.gdc.cancer.gov/) (The Cancer Genome Atlas): A joint
520
+ effort of National Cancer Institute and National Human Genome Research
521
+ Institute. Data from TCGA are available via the Genomic Data Commons (GDC)
522
+ * [CAMELYON](https://camelyon17.grand-challenge.org/Data/): The data was
523
+ collected from Radboud University Medical Center and University Medical
524
+ Center Utrecht in the Netherlands.
525
+ * [PMC-OA (PubMed Central Open Access
526
+ Subset)](https://catalog.data.gov/dataset/pubmed-central-open-access-subset-pmc-oa):
527
+ Maintained by the National Library of Medicine (NLM) and National Center for
528
+ Biotechnology Information (NCBI), which are part of the NIH.
529
+ * [MedQA](https://arxiv.org/pdf/2009.13081): This dataset was created by a
530
+ team of researchers led by Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung
531
+ Weng, Hanyi Fang, and Peter Szolovits
532
+ * [Mendeley Digital Knee
533
+ X-Ray](https://data.mendeley.com/datasets/t9ndx37v5h/1): This dataset is
534
+ from Rani Channamma University, and is hosted on Mendeley Data.
535
+ * [AfriMed-QA](https://afrimedqa.com/): This data was developed and led by
536
+ multiple collaborating organizations and researchers include key
537
+ contributors: Intron Health, SisonkeBiotik, BioRAMP, Georgia Institute of
538
+ Technology, and MasakhaneNLP.
539
+ * [VQA-RAD](https://www.nature.com/articles/sdata2018251): This dataset was
540
+ created by a research team led by Jason J. Lau, Soumya Gayen, Asma Ben
541
+ Abacha, and Dina Demner-Fushman and their affiliated institutions (the US
542
+ National Library of Medicine and National Institutes of Health)
543
+ * [Chest ImaGenome](https://physionet.org/content/chest-imagenome/1.0.0/): IBM
544
+ Research.
545
+ * [MedExpQA](https://www.sciencedirect.com/science/article/pii/S0933365724001805):
546
+ This dataset was created by researchers at the HiTZ Center (Basque Center
547
+ for Language Technology and Artificial Intelligence).
548
+ * [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA): This
549
+ dataset was developed by researchers at Tsinghua University (Beijing, China)
550
+ and Shanghai Artificial Intelligence Laboratory (Shanghai, China).
551
+ * [HealthSearchQA](https://huggingface.co/datasets/katielink/healthsearchqa):
552
+ This dataset consists of consisting of 3,173 commonly searched consumer
553
+ questions
554
+
555
+ In addition to the public datasets listed above, MedGemma was also trained on
556
+ de-identified, licensed datasets or datasets collected internally at Google from
557
+ consented participants.
558
+
559
+ * **Radiology dataset 1:** De-identified dataset of different CT studies
560
+ across body parts from a US-based radiology outpatient diagnostic center
561
+ network.
562
+ * **Ophthalmology dataset 1 (EyePACS):** De-identified dataset of fundus
563
+ images from diabetic retinopathy screening.
564
+ * **Dermatology dataset 1:** De-identified dataset of teledermatology skin
565
+ condition images (both clinical and dermatoscopic) from Colombia.
566
+ * **Dermatology dataset 2:** De-identified dataset of skin cancer images (both
567
+ clinical and dermatoscopic) from Australia.
568
+ * **Dermatology dataset 3:** De-identified dataset of non-diseased skin images
569
+ from an internal data collection effort.
570
+ * **Pathology dataset 1:** De-identified dataset of histopathology H\&E whole
571
+ slide images created in collaboration with an academic research hospital and
572
+ biobank in Europe. Comprises de-identified colon, prostate, and lymph nodes.
573
+ * **Pathology dataset 2:** De-identified dataset of lung histopathology H\&E
574
+ and IHC whole slide images created by a commercial biobank in the United
575
+ States.
576
+ * **Pathology dataset 3:** De-identified dataset of prostate and lymph node
577
+ H\&E and IHC histopathology whole slide images created by a contract
578
+ research organization in the United States.
579
+ * **Pathology dataset 4:** De-identified dataset of histopathology whole slide
580
+ images created in collaboration with a large, tertiary teaching hospital in
581
+ the United States. Comprises a diverse set of tissue and stain types,
582
+ predominantly H\&E.
583
+ * **EHR dataset 1:** Question/answer dataset drawn from synthetic FHIR records
584
+ created by [Synthea.](https://synthetichealth.github.io/synthea/) The test
585
+ set includes 19 unique patients with 200 questions per patient divided into
586
+ 10 different categories.
587
+
588
+ ### Data citation
589
+
590
+ * **MIMIC-CXR:** Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng,
591
+ S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet.
592
+ [https://physionet.org/content/mimic-cxr/2.1.0/](https://physionet.org/content/mimic-cxr/2.1.0/)
593
+ *and* Johnson, Alistair E. W., Tom J. Pollard, Seth J. Berkowitz, Nathaniel
594
+ R. Greenbaum, Matthew P. Lungren, Chih-Ying Deng, Roger G. Mark, and Steven
595
+ Horng. 2019\. "MIMIC-CXR, a de-Identified Publicly Available Database of
596
+ Chest Radiographs with Free-Text Reports." *Scientific Data 6* (1): 1–8.
597
+
598
+ * **SLAKE:** Liu, Bo, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, and Xiao-Ming Wu.
599
+ 2021.SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
600
+ Visual Question Answering."
601
+ [http://arxiv.org/abs/2102.09542](http://arxiv.org/abs/2102.09542).
602
+
603
+ * **PAD-UEFS-20:** Pacheco, Andre GC, et al. "PAD-UFES-20: A skin lesion
604
+ dataset composed of patient data and clinical images collected from
605
+ smartphones." *Data in brief* 32 (2020): 106221\.
606
+
607
+ * **SCIN:** Ward, Abbi, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley
608
+ Carrick, Bilson Campana, Jay Hartford, et al. 2024\. "Creating an Empirical
609
+ Dermatology Dataset Through Crowdsourcing With Web Search Advertisements."
610
+ *JAMA Network Open 7* (11): e2446615–e2446615.
611
+
612
+ * **TCGA:** The results shown here are in whole or part based upon data
613
+ generated by the TCGA Research Network:
614
+ [https://www.cancer.gov/tcga](https://www.cancer.gov/tcga).
615
+
616
+ * **CAMELYON16:** Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van
617
+ Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen A. W. M.
618
+ van der Laak, et al. 2017\. "Diagnostic Assessment of Deep Learning
619
+ Algorithms for Detection of Lymph Node Metastases in Women With Breast
620
+ Cancer." *JAMA 318* (22): 2199–2210.
621
+
622
+ * **Mendeley Digital Knee X-Ray:** Gornale, Shivanand; Patravali, Pooja
623
+ (2020), "Digital Knee X-ray Images", Mendeley Data, V1, doi:
624
+ 10.17632/t9ndx37v5h.1
625
+
626
+ * **VQA-RAD:** Lau, Jason J., Soumya Gayen, Asma Ben Abacha, and Dina
627
+ Demner-Fushman. 2018\. "A Dataset of Clinically Generated Visual Questions
628
+ and Answers about Radiology Images." *Scientific Data 5* (1): 1–10.
629
+
630
+ * **Chest ImaGenome:** Wu, J., Agu, N., Lourentzou, I., Sharma, A., Paguio,
631
+ J., Yao, J. S., Dee, E. C., Mitchell, W., Kashyap, S., Giovannini, A., Celi,
632
+ L. A., Syeda-Mahmood, T., & Moradi, M. (2021). Chest ImaGenome Dataset
633
+ (version 1.0.0). PhysioNet. RRID:SCR\_007345.
634
+ [https://doi.org/10.13026/wv01-y230](https://doi.org/10.13026/wv01-y230)
635
+
636
+ * **MedQA:** Jin, Di, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang,
637
+ and Peter Szolovits. 2020\. "What Disease Does This Patient Have? A
638
+ Large-Scale Open Domain Question Answering Dataset from Medical Exams."
639
+ [http://arxiv.org/abs/2009.13081](http://arxiv.org/abs/2009.13081).
640
+
641
+ * **AfrimedQA:** Olatunji, Tobi, Charles Nimo, Abraham Owodunni, Tassallah
642
+ Abdullahi, Emmanuel Ayodele, Mardhiyah Sanni, Chinemelu Aka, et al. 2024\.
643
+ "AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering
644
+ Benchmark Dataset."
645
+ [http://arxiv.org/abs/2411.15640](http://arxiv.org/abs/2411.15640).
646
+
647
+ * **MedExpQA:** Alonso, I., Oronoz, M., & Agerri, R. (2024). MedExpQA:
648
+ Multilingual Benchmarking of Large Language Models for Medical Question
649
+ Answering. *arXiv preprint arXiv:2404.05590*. Retrieved from
650
+ [https://arxiv.org/abs/2404.05590](https://arxiv.org/abs/2404.05590)
651
+
652
+ * **MedXpertQA:** Zuo, Yuxin, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu,
653
+ Ermo Hua, Kaiyan Zhang, Ning Ding, and Bowen Zhou. 2025\. "MedXpertQA:
654
+ Benchmarking Expert-Level Medical Reasoning and Understanding."
655
+ [http://arxiv.org/abs/2501.18362](http://arxiv.org/abs/2501.18362).
656
+
657
+ ### De-identification/anonymization:
658
+
659
+ Google and its partners utilize datasets that have been rigorously anonymized or
660
+ de-identified to ensure the protection of individual research participants and
661
+ patient privacy.
662
+
663
+ ## Implementation information
664
+
665
+ Details about the model internals.
666
+
667
+ ### Software
668
+
669
+ Training was done using [JAX](https://github.com/jax-ml/jax).
670
+
671
+ JAX allows researchers to take advantage of the latest generation of hardware,
672
+ including TPUs, for faster and more efficient training of large models.
673
+
674
+ ## Use and limitations
675
+
676
+ ### Intended use
677
+
678
+ MedGemma is an open multimodal generative AI model intended to be used as a
679
+ starting point that enables more efficient development of downstream healthcare
680
+ applications involving medical text and images. MedGemma is intended for
681
+ developers in the life sciences and healthcare space. Developers are responsible
682
+ for training, adapting and making meaningful changes to MedGemma to accomplish
683
+ their specific intended use. MedGemma models can be fine-tuned by developers
684
+ using their own proprietary data for their specific tasks or solutions.
685
+
686
+ MedGemma is based on Gemma 3 and has been further trained on medical images and
687
+ text. MedGemma enables further development in any medical context (image and
688
+ textual), however the model was pre-trained using chest X-ray, pathology,
689
+ dermatology, and fundus images. Examples of tasks within MedGemma's training
690
+ include visual question answering pertaining to medical images, such as
691
+ radiographs, or providing answers to textual medical questions. Full details of
692
+ all the tasks MedGemma has been evaluated can be found in the [MedGemma
693
+ Technical Report](https://arxiv.org/abs/2507.05201).
694
+
695
+ ### Benefits
696
+
697
+ * Provides strong baseline medical image and text comprehension for models of
698
+ its size.
699
+ * This strong performance makes it efficient to adapt for downstream
700
+ healthcare-based use cases, compared to models of similar size without
701
+ medical data pre-training.
702
+ * This adaptation may involve prompt engineering, grounding, agentic
703
+ orchestration or fine-tuning depending on the use case, baseline validation
704
+ requirements, and desired performance characteristics.
705
+
706
+ ### Limitations
707
+
708
+ MedGemma is not intended to be used without appropriate validation, adaptation
709
+ and/or making meaningful modification by developers for their specific use case.
710
+ The outputs generated by MedGemma are not intended to directly inform clinical
711
+ diagnosis, patient management decisions, treatment recommendations, or any other
712
+ direct clinical practice applications. Performance benchmarks highlight baseline
713
+ capabilities on relevant benchmarks, but even for image and text domains that
714
+ constitute a substantial portion of training data, inaccurate model output is
715
+ possible. All outputs from MedGemma should be considered preliminary and require
716
+ independent verification, clinical correlation, and further investigation
717
+ through established research and development methodologies.
718
+
719
+ MedGemma's multimodal capabilities have been primarily evaluated on single-image
720
+ tasks. MedGemma has not been evaluated in use cases that involve comprehension
721
+ of multiple images.
722
+
723
+ MedGemma has not been evaluated or optimized for multi-turn applications.
724
+
725
+ MedGemma's training may make it more sensitive to the specific prompt used than
726
+ Gemma 3\.
727
+
728
+ When adapting MedGemma developer should consider the following:
729
+
730
+ * **Bias in validation data:** As with any research, developers should ensure
731
+ that any downstream application is validated to understand performance using
732
+ data that is appropriately representative of the intended use setting for
733
+ the specific application (e.g., age, sex, gender, condition, imaging device,
734
+ etc).
735
+ * **Data contamination concerns**: When evaluating the generalization
736
+ capabilities of a large model like MedGemma in a medical context, there is a
737
+ risk of data contamination, where the model might have inadvertently seen
738
+ related medical information during its pre-training, potentially
739
+ overestimating its true ability to generalize to novel medical concepts.
740
+ Developers should validate MedGemma on datasets not publicly available or
741
+ otherwise made available to non-institutional researchers to mitigate this
742
+ risk.
743
+
744
+
745
+ ### Release notes
746
+
747
+ * May 20, 2025: Initial Release
748
+ * July 9, 2025 Bug Fix: Fixed the subtle degradation in the multimodal
749
+ performance. The issue was due to a missing end-of-image token in the model
750
+ vocabulary, impacting combined text-and-image tasks. This fix reinstates and
751
+ correctly maps that token, ensuring text-only tasks remain unaffected while
752
+ restoring multimodal performance.
attn.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad5ee7122d17040e7cbde995fe3a13fb38c77cd64921605c9b61e703fe85e070
3
+ size 464683
config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "attention_bias": false,
3
+ "attention_dropout": 0.0,
4
+ "attn_logit_softcapping": null,
5
+ "cache_implementation": "hybrid",
6
+ "final_logit_softcapping": null,
7
+ "head_dim": 256,
8
+ "hidden_activation": "gelu_pytorch_tanh",
9
+ "hidden_size": 2560,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 10240,
12
+ "max_position_embeddings": 131072,
13
+ "model_type": "gemma3_text",
14
+ "num_attention_heads": 8,
15
+ "num_hidden_layers": 34,
16
+ "num_key_value_heads": 4,
17
+ "query_pre_attn_scalar": 256,
18
+ "rms_norm_eps": 1e-06,
19
+ "rope_local_base_freq": 10000.0,
20
+ "rope_scaling": {
21
+ "factor": 8.0,
22
+ "rope_type": "linear"
23
+ },
24
+ "rope_theta": 1000000.0,
25
+ "sliding_window": 1024,
26
+ "sliding_window_pattern": 6,
27
+ "torch_dtype": "bfloat16",
28
+ "use_cache": true,
29
+ "vocab_size": 262208,
30
+ "addr_qk": 9216,
31
+ "addr_kv": 33792,
32
+ "addr_l_begin_mha": 53760,
33
+ "addr_l_end_mha": 25088,
34
+ "addr_kk": 45056,
35
+ "flm_version": "0.9.5",
36
+ "vision_model_weight": "vision_weight.q4nx",
37
+ "vision_mm_engine_xclbin_name": "vision_mm.xclbin",
38
+ "vision_mha_engine_xclbin_name":"vision_attn.xclbin",
39
+ "vision_conv2d_stride": 14,
40
+ "vision_conv2d_padding" : 0,
41
+ "vision_conv2d_kernel" : 14,
42
+ "vision_conv2d_Cin": 3,
43
+ "vision_conv2d_Cout": 1152,
44
+ "vision_average_pooling_kernel": 4,
45
+ "vision_average_pooling_stride" : 4,
46
+ "vision_average_pooling_padding" : 0,
47
+ "vision_layer_norm_eps" : 1e-06,
48
+ "vision_rms_norm_eps" : 1e-06,
49
+ "vision_intermediate_size": 4304,
50
+ "vision_hidden_size" : 1152,
51
+ "vision_head_dim": 72,
52
+ "vision_num_attention_heads": 16,
53
+ "vision_num_key_value_heads": 16,
54
+ "vision_num_hidden_layers": 27
55
+
56
+ }
dequant.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1be8893482f4d6f3634621e9f1c52355ce082cb768300ead02a54b05026b74c
3
+ size 115179
layer.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8662204bd8636a57e2ee298dff9e638e8ad257866f218cd5a7b0e3bea34f125
3
+ size 282955
lm_head.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd505faa35eb3c528ed492cb62cbc52b3f808a94896d79fe84a4bd426b209d7f
3
+ size 153355
mm.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1e88cd899e3fbef3b402ecaf2ab644f21cca2ee85dafe6e2521d89222c5a644
3
+ size 347675
model.q4nx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06cb8f39d3f84850c3e1938de304605cf842fddbb5fef946fa88140d811ff117
3
+ size 3768226744
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4667f2089529e8e7657cfb6d1c19910ae71ff5f28aa7ab2ff2763330affad795
3
+ size 33384568
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
vision_attn.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcbdc555bc96e2f9e2a9370eee37a5d179676d36c9e5ffd8b4a84617ea62c87b
3
+ size 515579
vision_mm.xclbin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2afec6607276d44e58d845ccbd63c3170442b96871dd91675062753049dda1e7
3
+ size 186395
vision_weight.q4nx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a60fad0894239350a7bfd3279923636b89e76e6231ac3cadc15793703024be14
3
+ size 1844564176