Vision Support?

by sidkandan - opened 4 days ago

4 days ago

"This current checkpoint only supports text input. We are actively working to roll out full multimodal features and are collaborating with open-source partners to bring Gemma 3n to the open-source community in the coming weeks."

Any estimates on when Image Input / Vision Capabilities will be available?

Thanks again for all the hard work to optimize this! :)

BalakrishnaCh

Google org 4 days ago

This comment has been hidden (marked as Off-Topic)

lkv

Google org 4 days ago

•

edited 4 days ago

Hi @sidkandan ,

This is the preview repo of Gemma 3n models. In Hugging face repos which are presented for these 3n models in both 2B and 4B models with this capabilities. To know more about Gemma 3n models Kindly refer this link.

And , Thanks so much for your enthusiasm for all full multimodal capabilities of Gemma 3n -litert- preview models ! We have noticed this request and will definitely route it to the concerned team for consideration. Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment