Speech transcriptions
Voice conversion model. Audio must be in 16kHz
Insert a text in English to generate a TTS
Classify audio to identify spoken words
Reconocimiento de comandos de voz