--- license: mit library_name: transformers pipeline_tag: voice-activity-detection tags: - speaker - speaker-diarization - meeting - wavlm - wespeaker - diarizen - pyannote - pyannote-audio-pipeline --- ## Overview This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen) as described in [BUT System for the MLC-SLM Challenge](https://huggingface.co/papers/2506.13414). The EEND component is built upon WavLM-Large and Conformer layers. The model was pre-trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse. Then structured pruning at 80% sparsity is applied. Finally, the pruned model is fine-tuned with [MLC-SLM](https://www.nexdata.ai/competition/mlc-slm) data. ## Usage ```python from diarizen.pipelines.inference import DiariZenPipeline # load pre-trained model diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-wavlm-large-s80-mlc") # apply diarization pipeline diar_results = diar_pipeline('audio.wav') # print results for turn, _, speaker in diar_results.itertracks(yield_label=True): print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}") # load pre-trained model and save RTTM result diar_pipeline = DiariZenPipeline.from_pretrained( "BUT-FIT/diarizen-wavlm-large-s80-mlc", rttm_out_dir='.' ) # apply diarization pipeline diar_results = diar_pipeline('audio.wav', sess_name='session_name') ``` ## Results DER evaluation of [Pyannote baseline](https://github.com/mubingshen/MLC-SLM-Baseline) and DiariZen, with **no collar** applied. | Dataset | Pyannote | DiariZen | |:-------------------|:--------:|:--------:| | English-American | 20.18 | 15.88 | | English-Australian | 13.76 | 10.82 | | English-British | 18.85 | 12.07 | | English-Filipino | 13.19 | 10.28 | | English-Indian | 8.19 | 6.04 | | French | 22.62 | 17.33 | | German | 22.33 | 16.35 | | Italian | 10.64 | 8.85 | | Japanese | 26.46 | 17.81 | | Korean | 23.25 | 16.36 | | Portuguese | 17.60 | 14.77 | | Russian | 11.37 | 9.99 | | Spanish | 12.92 | 10.82 | | Thai | 10.90 | 10.62 | | Vietnamese | 14.64 | 12.69 | | **Average** | **16.44**| **12.71**|