Multimodal support

#2
by Nasa1423 - opened

I understand Gemini was built to be natively multimodal. Could you elaborate on the current capabilities, especially regarding real-time processing of combined audio and video inputs? Furthermore, what does the development roadmap look like for expanding these core multimodal features?

Unsloth AI org

Currently this GGUF only supports text. We wrote it in the description. Hopefully llama.cpp will be able to support all forms soon

Ok, now I see that it is a llama.cpp restriction, not specifically this quant. Thanks!

Sign up or log in to comment