How to get speech synthesized and speech recognized?
#3
by
supercharge19
- opened
Following file: mmproj-model-f16.gguf
is used for image understanding, or can it also be used for ASR (automatic speech recognition) and/or speech synthesis (TTS)? If not then how can this gguf model be used to do that.
Please explain with detailed answer.
Readme shows that this repo only supports image inputs. The support for audios is in developing.
I am also developing other parts of minicpm-omni as quickly as possible, which will be open sourced to the community in the near future, including the asr part.
Thank you for working on great model, I wish I could help, though have limited skills. But for learning purposes, what can I do to develop the audio synthesis and ASR part?