Any efficient way to do diarization while keeping this model accuracy at transcribing multi-language audio?

#26

by raresmose - opened Oct 12, 2024

Oct 12, 2024

I am looking to detect speakers efficiently but haven't found a way.

I've tried the most popular solutions like AssemblyAI out there but they only work for English and I need a multi-language solution.

Do you know any?

psimm

Nov 27, 2024

https://github.com/Vaibhavs10/insanely-fast-whisper works with this model and combines it with pyannote for diarization.
Diarization quality isn't great though.

raresmose

May 23

•

edited May 23

Thanks @psimm ! Finally, I managed to figure out how to get amazing quality on both diarization and multi-language transcription but I had to build a custom solution.

muaviyaijaz123

Jun 8

•

edited Jun 8

@raresmose could you please share that approach for diarization? I am also working with german language for diarization but not getting quality results

raresmose

Jun 17

•

edited Jun 17

@muaviyaijaz123 I used the pyannote API. They have higher accuracy models there.

However, when I implemented this you still had to write a custom algorithm to merge the transcription and diarization segments.

I'm note sure if there's a better way to do this now.

raresmose changed discussion status to closed Jul 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment