Transcribe spoken words into text
Separate music tracks from audio
Chat with a bot using text and audio