How to get speech synthesized and speech recognized?

by supercharge19 - opened 12 days ago

12 days ago

Following file: mmproj-model-f16.gguf
is used for image understanding, or can it also be used for ASR (automatic speech recognition) and/or speech synthesis (TTS)? If not then how can this gguf model be used to do that.

Please explain with detailed answer.

S01aris

4 days ago

Readme shows that this repo only supports image inputs. The support for audios is in developing.

tc-mb

OpenBMB org 4 days ago

I am also developing other parts of minicpm-omni as quickly as possible, which will be open sourced to the community in the near future, including the asr part.

supercharge19

3 days ago

Thank you for working on great model, I wish I could help, though have limited skills. But for learning purposes, what can I do to develop the audio synthesis and ASR part?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment