Multimodal support
#2
by
Nasa1423
- opened
I understand Gemini was built to be natively multimodal. Could you elaborate on the current capabilities, especially regarding real-time processing of combined audio and video inputs? Furthermore, what does the development roadmap look like for expanding these core multimodal features?
Currently this GGUF only supports text. We wrote it in the description. Hopefully llama.cpp will be able to support all forms soon
Ok, now I see that it is a llama.cpp restriction, not specifically this quant. Thanks!