---
license: mit
library_name: transformers
pipeline_tag: voice-activity-detection
tags:
- speaker
- speaker-diarization
- meeting
- wavlm
- wespeaker
- diarizen
- pyannote
- pyannote-audio-pipeline
---

## Overview
This hub features the pre-trained model by [DiariZen](https://github.com/BUTSpeechFIT/DiariZen) as described in [BUT System for the MLC-SLM Challenge](https://huggingface.co/papers/2506.13414). The EEND component is built upon WavLM-Large and Conformer layers. The model was pre-trained on far-field, single-channel audio from a diverse set of public datasets, including AMI, AISHELL-4, AliMeeting, NOTSOFAR-1, MSDWild, DIHARD3, RAMC, and VoxConverse. Then structured pruning at 80% sparsity is applied. Finally, the pruned model is fine-tuned with [MLC-SLM](https://www.nexdata.ai/competition/mlc-slm) data. 


## Usage
```python
from diarizen.pipelines.inference import DiariZenPipeline

# load pre-trained model
diar_pipeline = DiariZenPipeline.from_pretrained("BUT-FIT/diarizen-wavlm-large-s80-mlc")
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav')

# print results
for turn, _, speaker in diar_results.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")

# load pre-trained model and save RTTM result
diar_pipeline = DiariZenPipeline.from_pretrained(
        "BUT-FIT/diarizen-wavlm-large-s80-mlc",
        rttm_out_dir='.'
)
# apply diarization pipeline
diar_results = diar_pipeline('audio.wav', sess_name='session_name')
```

## Results 

DER evaluation of [Pyannote baseline](https://github.com/mubingshen/MLC-SLM-Baseline) and DiariZen, with **no collar** applied.
| Dataset            | Pyannote | DiariZen |
|:-------------------|:--------:|:--------:|
| English-American   | 20.18    | 15.88    |
| English-Australian | 13.76    | 10.82    |
| English-British    | 18.85    | 12.07    |
| English-Filipino   | 13.19    | 10.28    |
| English-Indian     | 8.19     | 6.04     |
| French             | 22.62    | 17.33    |
| German             | 22.33    | 16.35    |
| Italian            | 10.64    | 8.85     |
| Japanese           | 26.46    | 17.81    |
| Korean             | 23.25    | 16.36    |
| Portuguese         | 17.60    | 14.77    |
| Russian            | 11.37    | 9.99     |
| Spanish            | 12.92    | 10.82    |
| Thai               | 10.90    | 10.62    |
| Vietnamese         | 14.64    | 12.69    |
| **Average**        | **16.44**| **12.71**|