unsloth/gemma-3-27b-it-GGUF · Does not support multimodal input

RamboRogers

20 days ago

Is there a way to make this mulitmodal like Gemma3 is?

shimmyshimmer

Unsloth AI org 18 days ago

Is there a way to make this mulitmodal like Gemma3 is?

Our upload does support multimodal.What are you using this on?

kallortz

17 days ago

Is there a way to make this mulitmodal like Gemma3 is?

+1

kallortz

17 days ago

•

edited 17 days ago

When trying to use the following conversation structure on unsloth/gemma-3-27b-it-unsloth-bnb-4bit

 
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Describe this image ."}
        ]
    }
]

The model returns the following:

 
["user\nYou are a helpful assistant.\n\nDescribe this image.\nmodel\nOkay, let's describe the image!\n\nThe image shows a cozy and inviting living room scene. Here's a breakdown of what I see:\n\n*   **Setting:** It appears to be a living room, likely in a home. The style is warm and inviting, with a focus on comfort.\n"]

While the content of the image is completely different. I also tried to give different URLs as input, but the answer is identical ("The image shows a cozy and inviting...")

shimmyshimmer

Unsloth AI org 14 days ago

Is there a way to make this mulitmodal like Gemma3 is?

+1

where are you using this on?

RamboRogers

4 days ago

Ollama

RamboRogers

4 days ago

Maybe it's something that needs to be set in Ollama?