Does not support multimodal input

#5
by RamboRogers - opened

Is there a way to make this mulitmodal like Gemma3 is?

Unsloth AI org

Is there a way to make this mulitmodal like Gemma3 is?

Our upload does support multimodal.What are you using this on?

Is there a way to make this mulitmodal like Gemma3 is?

+1

When trying to use the following conversation structure on unsloth/gemma-3-27b-it-unsloth-bnb-4bit

messages = [ { "role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}] }, { "role": "user", "content": [ {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"}, {"type": "text", "text": "Describe this image ."} ] } ]

The model returns the following:

["user\nYou are a helpful assistant.\n\nDescribe this image.\nmodel\nOkay, let's describe the image!\n\nThe image shows a cozy and inviting living room scene. Here's a breakdown of what I see:\n\n* **Setting:** It appears to be a living room, likely in a home. The style is warm and inviting, with a focus on comfort.\n"]

While the content of the image is completely different. I also tried to give different URLs as input, but the answer is identical ("The image shows a cozy and inviting...")

Unsloth AI org

Is there a way to make this mulitmodal like Gemma3 is?

+1

where are you using this on?

Maybe it's something that needs to be set in Ollama?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment