zpbrent's picture
Upload folder using huggingface_hub
f32a521 verified
---
tags:
- pyannote
- pyannote-audio
- pyannote-audio-pipeline
- speaker-diarization
license: mit
language:
- en
---
# Configuration
This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.
Before starting, please ensure the requirements are met:
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio`
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
5. Download pytorch_model.bin and config.yaml files into your local directory.
## Usage
### Load trained segmentation model
```python
import torch
from pyannote.audio import Model
# Load the original architecture, will need to replace with your own auth token
model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)
# Path to the downloaded pytorch model
model_path = "models/pyannote_sd_normal"
# Load fine-tuned weights from the pytorch_model.bin file
model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))
```
### Load fine-tuned speaker diarization pipeline
```python
from pyannote.audio import Pipeline
from pyannote.metrics.diarization import DiarizationErrorRate
from pyannote.audio.pipelines import SpeakerDiarization
# Initialize the pyannote pipeline, will need to replace with your own auth token
pretrained_pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=True)
finetuned_pipeline = SpeakerDiarization(
segmentation=model,
embedding=pretrained_pipeline.embedding,
embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
clustering=pretrained_pipeline.klustering,
)
# Load fine-tuned params into the pipeline
finetuned_pipeline.load_params(model_path + "/config.yaml")
```
### GPU usage
```
if torch.cuda.is_available():
gpu = torch.device("cuda")
finetuned_pipeline.to(gpu)
print("gpu: ", torch.cuda.get_device_name(gpu))
```
### Visualise diarization output
```
diarization = finetuned_pipeline("path/to/audio.wav")
diarization
```
### View speaker turns, speaker ID, and time
```
for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")
```
## Citations
```bibtex
@inproceedings{Plaquet23,
author={Alexis Plaquet and Hervé Bredin},
title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
```
```bibtex
@inproceedings{Bredin23,
author={Hervé Bredin},
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
```