Any efficient way to do diarization while keeping this model accuracy at transcribing multi-language audio?

#26
by raresmose - opened

I am looking to detect speakers efficiently but haven't found a way.

I've tried the most popular solutions like AssemblyAI out there but they only work for English and I need a multi-language solution.

Do you know any?

https://github.com/Vaibhavs10/insanely-fast-whisper works with this model and combines it with pyannote for diarization.
Diarization quality isn't great though.

Thanks @psimm ! Finally, I managed to figure out how to get amazing quality on both diarization and multi-language transcription but I had to build a custom solution.

Sign up or log in to comment