Using this model in ollama - lacks vision capability?

#2
by ksw74 - opened

"ollama run hf.co/mlabonne/gemma-3-27b-it-qat-abliterated-GGUF:Q5_K_M" works well for text input, but adding an image to the command line doesn't work and per "/show info" there is no vision capability.

Is this intentional, or a known limitation?

It's something to do with the structure of the model that is non-standard. I'll try to fix it in the next version, but it's possible to add the mmproj.

Will the model work for vision if I load in python transformers rather than using ollama?

Great, thanks, I'll give it a go.

No luck loading the model via transformers, unfortunately. If I try Gemma3ForConditionalGeneration.from_pretrained, I get a warning "You are using a model of type gemma3_text to instantiate a model of type gemma3. This is not supported for all configurations of models and can yield errors." then a "size mismatch for weight" error. If I instead try loading with the generic "pipeline(task="image-text-to-text",.." I get a "ValueError: Unrecognized configuration class <class 'transformers.models.gemma.configuration_gemma.GemmaConfig'> for this kind of AutoModel: AutoModelForImageTextToText. Model type should be one of AriaConfig, AyaVisionConfig, BlipConfig, Blip2Config, ChameleonConfig, Emu3Config, FuyuConfig, Gemma3Config, GitConfig, GotOcr2Config, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InternVLConfig, JanusConfig, Kosmos2Config, Llama4Config, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, Mistral3Config, MllamaConfig, PaliGemmaConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2_5_VLConfig, Qwen2VLConfig, ShieldGemma2Config, SmolVLMConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig."

Actually, I could load using AutoModel.from_pretrained, but then calling model.generate fails with "AttributeError: 'Gemma3TextModel' object has no attribute 'generate'"

Sign up or log in to comment