Quantized models with vision included?
I have not found any quantized models yet that retain vision component. Is that technically possible and are there examples?
I have not found any quantized models yet that retain vision component. Is that technically possible and are there examples?
I'm not sure about other quant formats, but about GGUF I've read that the llamacpp author gave up on the support for vision, so naturally if the format itself doesn't support it, there's no way you could use it. I believe LM Studio still supports some LLMs with vision, but the support is probably limited to older models from the time when the support was actively maintained in llamacpp itself.
That's not entirely true, see https://github.com/ggml-org/llama.cpp/issues/8010
There's still a ton of refactoring to do though
Yeah vision is far from dead but is definitely a struggle
qwen2.5 vl has awq and gguf quants
qwen2.5 vl has awq and gguf quants
There are many quants of vision models, but paying attention to details in model card is the key - most of them mention that these quants don't actually have vision part of the original unquantized model.
Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.
This support is still only limited to the cli chat and is not usable for any real projects that require API use (llama-server)
Ollama supports vision via API, but they have other problems with serving Gemma 3
Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.
Can you point me to a GGUF model of Gemma 3 with vision support in LM Studio? I tried some that LM Studio marked as such, but when I tried it, it didn't work. Instead of actually analyzing the image content, the model simply roleplayed it like it told me it's trying to read what appears to be letters on the image, said it's not sure what it's written there but will try to read it anyway and ended up writing random excerpts from system prompt as the supposed writings shown in the image. 🤣
Gemma 3 had vision support day 1 in llama.cpp. Because Google helped with the integration. But it appears like Mistral doesn't care enough.
Can you point me to a GGUF model of Gemma 3 with vision support in LM Studio? I tried some that LM Studio marked as such, but when I tried it, it didn't work. Instead of actually analyzing the image content, the model simply roleplayed it like it told me it's trying to read what appears to be letters on the image, said it's not sure what it's written there but will try to read it anyway and ended up writing random excerpts from system prompt as the supposed writings shown in the image. 🤣
The Ollama library has the quanted gemma3, https://ollama.com/library/gemma3, which supports vision layer. I tried it on ollama, works perfect. Not sure about LM Studio.
But interstingly, no other GGUF model of Gemma 3 on huggingface supports vision layer, as far as I tried.
The Ollama library has the quanted gemma3, https://ollama.com/library/gemma3, which supports vision layer. I tried it on ollama, works perfect. Not sure about LM Studio.
But interstingly, no other GGUF model of Gemma 3 on huggingface supports vision layer, as far as I tried.
I asked for model with vision support in LM Studio specifically because it is based on llama.cpp as mentioned by Dampf. I have Ollama installed, but I'd prefer LM Studio for this. There's literally 0 UI that would support passing images for Ollama that I would be aware of and LM Studio has that feature built in. Trouble is, it doesn't really work lol.
no other GGUF model of Gemma 3 on huggingface supports vision layer
It's because Ollama uses its own inference backend (doing GGML calls via CGO) for Gemma 3 instead of using llama.cpp
IIRC Bartowski's quants have the mmproj, so you should be able to use them via the cli (llama-gemma3-cli
)
no other GGUF model of Gemma 3 on huggingface supports vision layer
It's because Ollama uses its own inference backend (doing GGML calls via CGO) for Gemma 3 instead of using llama.cpp
IIRC Bartowski's quants have the mmproj, so you should be able to use them via the cli (
llama-gemma3-cli
)
Thank you for the information. That really solved my confusion.