Vision not working in ollama
#4
by
ocarson
- opened
The GGUFs hosted here don't seem to work on ollama with images, the version on their site does. Any way to fix this?
Do you know if it works on llama.cpp? :)
+1
Do you know if it works on llama.cpp? :)
iwommllama-gemma3-cli -m gemma-3-4b-it-Q2_K.gguf --mmproj mmproj-BF16.gguf --image some.png -p "What can you see?" -ngl 34
No, not working using Ollama (with & without) Open-webui
without open-webui
$ ollama run hf.co/unsloth/gemma-3-4b-it-GGUF:Q2_K_L
pulling manifest
pulling 84995a47f7a2... 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 1.9 GB
pulling e0a42594d802... 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 358 B
pulling dfd94f00498e... 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 851 MB
pulling 39bfa1773b74... 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 201 B
pulling 6b67f0f2e01a... 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 195 B
verifying sha256 digest
writing manifest
success
>>> /home/bkjzon/Pictures/c7ed2064f5aaa6ceaeec216f3bfddd9d22a444ab-2913263995.png
Added image '/home/bkjzon/Pictures/c7ed2064f5aaa6ceaeec216f3bfddd9d22a444ab-2913263995.png'
Error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input
journalctl -u ollama -b
Arch Linux
systemd[1]: Started Ollama Service.
ollama[22060]: 2025/04/11 14:49:31 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
ollama[22060]: time=2025-04-11T14:49:31.920+08:00 level=INFO source=images.go:458 msg="total blobs: 37"
ollama[22060]: time=2025-04-11T14:49:31.920+08:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
ollama[22060]: time=2025-04-11T14:49:31.921+08:00 level=INFO source=routes.go:1298 msg="Listening on 127.0.0.1:11434 (version 0.6.5)"
ollama[22060]: time=2025-04-11T14:49:31.921+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
ollama[22060]: time=2025-04-11T14:49:32.082+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-8b6264f1-0cf1-ba9f-4441-21d371506733 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4060 Laptop GPU" total="7.6 GiB" available="7.2 GiB"
ollama[22060]: [GIN] 2025/04/11 - 16:41:08 | 200 | 11.166505ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 16:41:08 | 200 | 115.634Β΅s | 127.0.0.1 | GET "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 16:41:17 | 200 | 1.355552ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 17:17:54 | 200 | 204.293Β΅s | 127.0.0.1 | GET "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 17:38:53 | 200 | 12.776021ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 17:38:53 | 200 | 57.172Β΅s | 127.0.0.1 | GET "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 17:38:59 | 200 | 2.086745ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 18:26:43 | 200 | 9.490859ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: time=2025-04-11T18:26:56.027+08:00 level=INFO source=download.go:177 msg="downloading 07f73c8ce8af in 16 108 MB part(s)"
ollama[22060]: [GIN] 2025/04/11 - 18:28:38 | 200 | 6.191286ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: time=2025-04-11T18:32:23.545+08:00 level=INFO source=download.go:177 msg="downloading dfd94f00498e in 9 100 MB part(s)"
ollama[22060]: time=2025-04-11T18:35:09.517+08:00 level=INFO source=download.go:177 msg="downloading 39bfa1773b74 in 1 201 B part(s)"
ollama[22060]: [GIN] 2025/04/11 - 18:35:16 | 200 | 8m26s | 127.0.0.1 | POST "/api/pull"
ollama[22060]: [GIN] 2025/04/11 - 18:35:31 | 200 | 85.458Β΅s | 127.0.0.1 | GET "/api/version"
ollama[22060]: [GIN] 2025/04/11 - 18:35:35 | 200 | 1.288213ms | 127.0.0.1 | GET "/api/tags"
ollama[22060]: [GIN] 2025/04/11 - 18:35:35 | 200 | 64.751Β΅s | 127.0.0.1 | GET "/api/version"
ollama[22060]: time=2025-04-11T18:36:18.824+08:00 level=INFO source=sched.go:716 msg="new model will fit in available VRAM in single GPU, loading" model=/var/lib/ollama/blobs/sha256-07f73c8ce8afe9ee878cafe45e10f560dd594c38cd7912e62f868a57373c6d14 gpu=GPU-8b6264f1-0cf1-ba9f-4441-21d371506733 parallel=4 available=8055685120 required="4.1 GiB"
ollama[22060]: time=2025-04-11T18:36:18.951+08:00 level=INFO source=server.go:105 msg="system memory" total="15.3 GiB" free="7.6 GiB" free_swap="7.2 GiB"
ollama[22060]: time=2025-04-11T18:36:18.952+08:00 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=35 layers.offload=35 layers.split="" memory.available="[7.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.1 GiB" memory.required.partial="4.1 GiB" memory.required.kv="682.0 MiB" memory.required.allocations="[4.1 GiB]" memory.weights.total="1.6 GiB" memory.weights.repeating="1.1 GiB" memory.weights.nonrepeating="525.1 MiB" memory.graph.full="517.1 MiB" memory.graph.partial="1.0 GiB" projector.weights="811.8 MiB" projector.graph="0 B"
ollama[22060]: time=2025-04-11T18:36:19.005+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:19.007+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama[22060]: time=2025-04-11T18:36:19.018+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /var/lib/ollama/blobs/sha256-07f73c8ce8afe9ee878cafe45e10f560dd594c38cd7912e62f868a57373c6d14 --ctx-size 8192 --batch-size 512 --n-gpu-layers 35 --threads 6 --parallel 4 --port 40381"
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
ollama[22060]: time=2025-04-11T18:36:19.024+08:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
ollama[22060]: time=2025-04-11T18:36:19.025+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
ollama[22060]: time=2025-04-11T18:36:19.038+08:00 level=INFO source=runner.go:816 msg="starting ollama engine"
ollama[22060]: time=2025-04-11T18:36:19.038+08:00 level=INFO source=runner.go:879 msg="Server listening on 127.0.0.1:40381"
ollama[22060]: time=2025-04-11T18:36:19.092+08:00 level=WARN source=ggml.go:152 msg="key not found" key=general.description default=""
ollama[22060]: time=2025-04-11T18:36:19.092+08:00 level=INFO source=ggml.go:67 msg="" architecture=gemma3 file_type=Q2_K name=Gemma-3-4B-It description="" num_tensors=444 num_key_values=35
ollama[22060]: time=2025-04-11T18:36:19.276+08:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
ollama[22060]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ollama[22060]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ollama[22060]: ggml_cuda_init: found 1 CUDA devices:
ollama[22060]: Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes
ollama[22060]: load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
ollama[22060]: load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
ollama[22060]: time=2025-04-11T18:36:19.493+08:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,530,600,610,620,700,720,750,800,860,870,890,900 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
ollama[22060]: time=2025-04-11T18:36:19.585+08:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CUDA0 size="1.6 GiB"
ollama[22060]: time=2025-04-11T18:36:19.585+08:00 level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="525.1 MiB"
ollama[22060]: time=2025-04-11T18:36:20.251+08:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CUDA0 buffer_type=CUDA0
ollama[22060]: time=2025-04-11T18:36:20.251+08:00 level=INFO source=ggml.go:388 msg="compute graph" backend=CPU buffer_type=CUDA_Host
ollama[22060]: time=2025-04-11T18:36:20.260+08:00 level=WARN source=ggml.go:152 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.num_channels default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.block_count default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.embedding_length default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.head_count default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.image_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.patch_size default=0
ollama[22060]: time=2025-04-11T18:36:20.262+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.vision.attention.layer_norm_epsilon default=0
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.local.freq_base default=10000
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.rope.freq_scale default=1
ollama[22060]: time=2025-04-11T18:36:20.266+08:00 level=WARN source=ggml.go:152 msg="key not found" key=gemma3.mm_tokens_per_image default=256
ollama[22060]: time=2025-04-11T18:36:20.280+08:00 level=INFO source=server.go:619 msg="llama runner started in 1.26 seconds"
ollama[22060]: time=2025-04-11T18:36:20.310+08:00 level=INFO source=server.go:789 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"
ollama[22060]: [GIN] 2025/04/11 - 18:36:20 | 200 | 1.771292922s | 127.0.0.1 | POST "/api/chat"
So I asked the Ollama folks, they have a unique way of doing GGUFs as they integrate the mmproj intoi the actual file so unfortunately there's nothing we can do about it :(
This comment has been hidden (marked as Off-Topic)