unsloth/gemma-3-4b-it-GGUF · Vision not working in ollama

ocarson

Mar 18

The GGUFs hosted here don't seem to work on ollama with images, the version on their site does. Any way to fix this?

shimmyshimmer

Unsloth AI org Mar 19

Do you know if it works on llama.cpp? :)

Yehudi-dev

Mar 21

+1

bkjzon

Apr 11

Do you know if it works on llama.cpp? :)

iwomm
llama-gemma3-cli -m gemma-3-4b-it-Q2_K.gguf --mmproj mmproj-BF16.gguf --image some.png -p "What can you see?" -ngl 34

bkjzon

Apr 11

•

edited Apr 11

No, not working using Ollama (with & without) Open-webui

without open-webui

$ ollama run hf.co/unsloth/gemma-3-4b-it-GGUF:Q2_K_L
pulling manifest 
pulling 84995a47f7a2... 100% ▕█████████████████████████████████████████████████████████▏ 1.9 GB                         
pulling e0a42594d802... 100% ▕█████████████████████████████████████████████████████████▏  358 B                         
pulling dfd94f00498e... 100% ▕█████████████████████████████████████████████████████████▏ 851 MB                         
pulling 39bfa1773b74... 100% ▕█████████████████████████████████████████████████████████▏  201 B                         
pulling 6b67f0f2e01a... 100% ▕█████████████████████████████████████████████████████████▏  195 B                         
verifying sha256 digest 
writing manifest 
success 
>>> /home/bkjzon/Pictures/c7ed2064f5aaa6ceaeec216f3bfddd9d22a444ab-2913263995.png
Added image '/home/bkjzon/Pictures/c7ed2064f5aaa6ceaeec216f3bfddd9d22a444ab-2913263995.png'
Error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input

Downloaded from

journalctl -u ollama -b
Arch Linux

systemd[1]: Started Ollama Service.
ollama[22060]: 2025/04/11 14:49:31 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
ollama[22060]: time=2025-04-11T14:49:31.920+08:00 level=INFO source=images.go:458 msg="total blobs: 37"
ollama[22060]: time=2025-04-11T14:49:31.920+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
ollama[22060]: time=2025-04-11T14:49:31.921+08:00 level=INFO source=routes.go:1298 msg="Listening on 127.0.0.1:11434 (version 0.6.5)"
ollama[22060]: time=2025-04-11T14:49:31.921+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
ollama[22060]: time=2025-04-11T14:49:32.082+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-8b6264f1-0cf1-ba9f-4441-21d371506733 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Laptop GPU" total="7.6 GiB" available="7.2 GiB"
ollama[22060]: [GIN] 2025/04/11 - 16:41:08 | 200 |   11.166505ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 16:41:08 | 200 |     115.634µs |       127.0.0.1 | GET      "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 16:41:17 | 200 |    1.355552ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 17:17:54 | 200 |     204.293µs |       127.0.0.1 | GET      "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 17:38:53 | 200 |   12.776021ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 17:38:53 | 200 |      57.172µs |       127.0.0.1 | GET      "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 17:38:59 | 200 |    2.086745ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 18:26:43 | 200 |    9.490859ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: time=2025-04-11T18:26:56.027+08:00 level=INFO source=download.go:177 msg="downloading 07f73c8ce8af in 16 108 MB part(s)"
ollama[22060]: [GIN] 2025/04/11 - 18:28:38 | 200 |    6.191286ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: time=2025-04-11T18:32:23.545+08:00 level=INFO source=download.go:177 msg="downloading dfd94f00498e in 9 100 MB part(s)"
ollama[22060]: time=2025-04-11T18:35:09.517+08:00 level=INFO source=download.go:177 msg="downloading 39bfa1773b74 in 1 201 B part(s)"
ollama[22060]: [GIN] 2025/04/11 - 18:35:16 | 200 |         8m26s |       127.0.0.1 | POST     "/api/pull"
ollama[22060]: [GIN] 2025/04/11 - 18:35:31 | 200 |      85.458µs |       127.0.0.1 | GET      "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 18:35:35 | 200 |    1.288213ms |       127.0.0.1 | GET      "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 18:35:35 | 200 |      64.751µs |       127.0.0.1 | GET      "/api/version"
ollama[22060]: time=2025-04-11T18:36:18.824+08:00 level=INFO source=sched.go:716 msg="new model will fit in available VRAM in single GPU, loading" model=/var/lib/ollama/blobs/sha256-07f73c8ce8afe9ee878cafe45e10f560dd594c38cd7912e62f868a57373c6d14 gpu=GPU-8b6264f1-0cf1-ba9f-4441-21d371506733 parallel=4 available=8055685120 required="4.1 GiB"
ollama[22060]: time=2025-04-11T18:36:18.951+08:00 level=INFO source=server.go:105 msg="system memory" total="15.3 GiB" free="7.6 GiB" free_swap="7.2 GiB"
ollama[22060]: time=2025-04-11T18:36:18.952+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=35 layers.offload=35 layers.split="" memory.available="[7.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.1 GiB" memory.required.partial="4.1 GiB" memory.required.kv="682.0 MiB" memory.required.allocations="[4.1 GiB]" memory.weights.total="1.6 GiB" memory.weights.repeating="1.1 GiB" memory.weights.nonrepeating="525.1 MiB" memory.graph.full="517.1 MiB" memory.graph.partial="1.0 GiB" projector.weights="811.8 MiB" projector.graph="0 B"
ollama[22060]: time=2025-04-11T18:36:19.005+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /var/lib/ollama/blobs/sha256-07f73c8ce8afe9ee878cafe45e10f560dd594c38cd7912e62f868a57373c6d14 --ctx-size 8192 --batch-size 512 --n-gpu-layers 35 --threads 6 --parallel 4 --port 40381"
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
ollama[22060]: time=2025-04-11T18:36:19.025+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
ollama[22060]: time=2025-04-11T18:36:19.038+08:00 level=INFO source=runner.go:816 msg="starting ollama engine"
ollama[22060]: time=2025-04-11T18:36:19.038+08:00 level=INFO source=runner.go:879 msg="Server listening on 127.0.0.1:40381"
ollama[22060]: time=2025-04-11T18:36:19.092+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
ollama[22060]: time=2025-04-11T18:36:19.092+08:00 level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q2_K name=Gemma-3-4B-It description="" num_tensors=444 num_key_values=35
ollama[22060]: time=2025-04-11T18:36:19.276+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ollama[22060]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ollama[22060]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama[22060]: ggml_cuda_init: found 1 CUDA devices:
ollama[22060]:   Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes
ollama[22060]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
ollama[22060]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
ollama[22060]: time=2025-04-11T18:36:19.493+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,530,600,610,620,700,720,750,800,860,870,890,900 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
ollama[22060]: time=2025-04-11T18:36:19.585+08:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="1.6 GiB"
ollama[22060]: time=2025-04-11T18:36:19.585+08:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="525.1 MiB"
ollama[22060]: time=2025-04-11T18:36:20.251+08:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
ollama[22060]: time=2025-04-11T18:36:20.251+08:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CPU buffer_type=CUDA_Host
ollama[22060]: time=2025-04-11T18:36:20.260+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama[22060]: time=2025-04-11T18:36:20.280+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.26 seconds"
ollama[22060]: time=2025-04-11T18:36:20.310+08:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"
ollama[22060]: [GIN] 2025/04/11 - 18:36:20 | 200 |  1.771292922s |       127.0.0.1 | POST     "/api/chat"

shimmyshimmer

Unsloth AI org May 12

So I asked the Ollama folks, they have a unique way of doing GGUFs as they integrate the mmproj intoi the actual file so unfortunately there's nothing we can do about it :(

ntsarb

Jun 5

This comment has been hidden (marked as Off-Topic)