Quantized models with vision included?

#27

by geoad - opened Mar 19

Mar 19

I have not found any quantized models yet that retain vision component. Is that technically possible and are there examples?

MrDevolver

Mar 19

I have not found any quantized models yet that retain vision component. Is that technically possible and are there examples?

I'm not sure about other quant formats, but about GGUF I've read that the llamacpp author gave up on the support for vision, so naturally if the format itself doesn't support it, there's no way you could use it. I believe LM Studio still supports some LLMs with vision, but the support is probably limited to older models from the time when the support was actively maintained in llamacpp itself.

x0wllaar

Mar 19

That's not entirely true, see https://github.com/ggml-org/llama.cpp/issues/8010

There's still a ton of refactoring to do though

bartowski

Mar 20

Yeah vision is far from dead but is definitely a struggle

CHNtentes

Mar 22

qwen2.5 vl has awq and gguf quants

MrDevolver

Mar 22

qwen2.5 vl has awq and gguf quants

There are many quants of vision models, but paying attention to details in model card is the key - most of them mention that these quants don't actually have vision part of the original unquantized model.

Dampfinchen

Mar 23

Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.

x0wllaar

Mar 23

This support is still only limited to the cli chat and is not usable for any real projects that require API use (llama-server)

Ollama supports vision via API, but they have other problems with serving Gemma 3

MrDevolver

Mar 23

Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.

Can you point me to a GGUF model of Gemma 3 with vision support in LM Studio? I tried some that LM Studio marked as such, but when I tried it, it didn't work. Instead of actually analyzing the image content, the model simply roleplayed it like it told me it's trying to read what appears to be letters on the image, said it's not sure what it's written there but will try to read it anyway and ended up writing random excerpts from system prompt as the supposed writings shown in the image. 🤣

qbyteSteven

Mar 24

Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.

Can you point me to a GGUF model of Gemma 3 with vision support in LM Studio? I tried some that LM Studio marked as such, but when I tried it, it didn't work. Instead of actually analyzing the image content, the model simply roleplayed it like it told me it's trying to read what appears to be letters on the image, said it's not sure what it's written there but will try to read it anyway and ended up writing random excerpts from system prompt as the supposed writings shown in the image. 🤣
The Ollama library has the quanted gemma3, https://ollama.com/library/gemma3, which supports vision layer. I tried it on ollama, works perfect. Not sure about LM Studio.
But interstingly, no other GGUF model of Gemma 3 on huggingface supports vision layer, as far as I tried.

MrDevolver

Mar 24

The Ollama library has the quanted gemma3, https://ollama.com/library/gemma3, which supports vision layer. I tried it on ollama, works perfect. Not sure about LM Studio.
But interstingly, no other GGUF model of Gemma 3 on huggingface supports vision layer, as far as I tried.

I asked for model with vision support in LM Studio specifically because it is based on llama.cpp as mentioned by Dampf. I have Ollama installed, but I'd prefer LM Studio for this. There's literally 0 UI that would support passing images for Ollama that I would be aware of and LM Studio has that feature built in. Trouble is, it doesn't really work lol.

x0wllaar

Mar 24

no other GGUF model of Gemma 3 on huggingface supports vision layer

It's because Ollama uses its own inference backend (doing GGML calls via CGO) for Gemma 3 instead of using llama.cpp

IIRC Bartowski's quants have the mmproj, so you should be able to use them via the cli (llama-gemma3-cli)

qbyteSteven

Mar 24

•

edited Mar 24

no other GGUF model of Gemma 3 on huggingface supports vision layer

It's because Ollama uses its own inference backend (doing GGML calls via CGO) for Gemma 3 instead of using llama.cpp

IIRC Bartowski's quants have the mmproj, so you should be able to use them via the cli (llama-gemma3-cli)

Thank you for the information. That really solved my confusion.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment