Multimodal support

by Nasa1423 - opened 1 day ago

1 day ago

I understand Gemini was built to be natively multimodal. Could you elaborate on the current capabilities, especially regarding real-time processing of combined audio and video inputs? Furthermore, what does the development roadmap look like for expanding these core multimodal features?

shimmyshimmer

Unsloth AI org 1 day ago

Currently this GGUF only supports text. We wrote it in the description. Hopefully llama.cpp will be able to support all forms soon

Nasa1423

1 day ago

Ok, now I see that it is a llama.cpp restriction, not specifically this quant. Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment