Does not support multimodal input
#5
by
RamboRogers
- opened
Is there a way to make this mulitmodal like Gemma3 is?
Is there a way to make this mulitmodal like Gemma3 is?
Our upload does support multimodal.What are you using this on?
Is there a way to make this mulitmodal like Gemma3 is?
+1
When trying to use the following conversation structure on unsloth/gemma-3-27b-it-unsloth-bnb-4bit
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."}]
},
{
"role": "user",
"content": [
{"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
{"type": "text", "text": "Describe this image ."}
]
}
]
The model returns the following:
["user\nYou are a helpful assistant.\n\nDescribe this image.\nmodel\nOkay, let's describe the image!\n\nThe image shows a cozy and inviting living room scene. Here's a breakdown of what I see:\n\n* **Setting:** It appears to be a living room, likely in a home. The style is warm and inviting, with a focus on comfort.\n"]
While the content of the image is completely different. I also tried to give different URLs as input, but the answer is identical ("The image shows a cozy and inviting...")
Is there a way to make this mulitmodal like Gemma3 is?
+1
where are you using this on?
Ollama
Maybe it's something that needs to be set in Ollama?