Transcribe Japanese audio to text
GPT-SoVITS for MITA!
Whisper model to transcript japanese audio to katakana.
Generate audio from text using reference audio
Generate audio from text using a voice synthesis model