|
--- |
|
tags: |
|
- pyannote |
|
- pyannote-audio |
|
- pyannote-audio-pipeline |
|
- speaker-diarization |
|
license: mit |
|
language: |
|
- en |
|
--- |
|
# Configuration |
|
This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data. |
|
|
|
Before starting, please ensure the requirements are met: |
|
|
|
1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio` |
|
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions |
|
3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions |
|
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens). |
|
5. Download pytorch_model.bin and config.yaml files into your local directory. |
|
|
|
## Usage |
|
|
|
### Load trained segmentation model |
|
```python |
|
import torch |
|
from pyannote.audio import Model |
|
|
|
# Load the original architecture, will need to replace with your own auth token |
|
model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True) |
|
|
|
# Path to the downloaded pytorch model |
|
model_path = "models/pyannote_sd_normal" |
|
|
|
# Load fine-tuned weights from the pytorch_model.bin file |
|
model.load_state_dict(torch.load(model_path + "/pytorch_model.bin")) |
|
``` |
|
### Load fine-tuned speaker diarization pipeline |
|
```python |
|
from pyannote.audio import Pipeline |
|
from pyannote.metrics.diarization import DiarizationErrorRate |
|
from pyannote.audio.pipelines import SpeakerDiarization |
|
|
|
# Initialize the pyannote pipeline, will need to replace with your own auth token |
|
pretrained_pipeline = Pipeline.from_pretrained( |
|
"pyannote/speaker-diarization-3.1", |
|
use_auth_token=True) |
|
|
|
finetuned_pipeline = SpeakerDiarization( |
|
segmentation=model, |
|
embedding=pretrained_pipeline.embedding, |
|
embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap, |
|
clustering=pretrained_pipeline.klustering, |
|
) |
|
|
|
# Load fine-tuned params into the pipeline |
|
finetuned_pipeline.load_params(model_path + "/config.yaml") |
|
``` |
|
### GPU usage |
|
``` |
|
if torch.cuda.is_available(): |
|
gpu = torch.device("cuda") |
|
finetuned_pipeline.to(gpu) |
|
print("gpu: ", torch.cuda.get_device_name(gpu)) |
|
``` |
|
|
|
### Visualise diarization output |
|
``` |
|
diarization = finetuned_pipeline("path/to/audio.wav") |
|
diarization |
|
``` |
|
|
|
### View speaker turns, speaker ID, and time |
|
``` |
|
for speech_turn, track, speaker in diarization.itertracks(yield_label=True): |
|
print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}") |
|
``` |
|
|
|
## Citations |
|
|
|
```bibtex |
|
@inproceedings{Plaquet23, |
|
author={Alexis Plaquet and Hervé Bredin}, |
|
title={{Powerset multi-class cross entropy loss for neural speaker diarization}}, |
|
year=2023, |
|
booktitle={Proc. INTERSPEECH 2023}, |
|
} |
|
``` |
|
|
|
```bibtex |
|
@inproceedings{Bredin23, |
|
author={Hervé Bredin}, |
|
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}}, |
|
year=2023, |
|
booktitle={Proc. INTERSPEECH 2023}, |
|
} |
|
``` |