PhishingGallery
/

pyannote-audio-speaker-diarization_2_jump

pyannote-audio-pipeline

speaker-diarization

Model card Files Files and versions Community

pyannote-audio-speaker-diarization_2_jump / README.md

zpbrent's picture

Upload folder using huggingface_hub

f32a521 verified 4 months ago

|

history blame contribute delete

2.95 kB

	---
	tags:
	- pyannote
	- pyannote-audio
	- pyannote-audio-pipeline
	- speaker-diarization
	license: mit
	language:
	- en
	---
	# Configuration
	This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.

	Before starting, please ensure the requirements are met:

	1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio`
	2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
	3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
	4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
	5. Download pytorch_model.bin and config.yaml files into your local directory.

	## Usage

	### Load trained segmentation model
	```python
	import torch
	from pyannote.audio import Model

	# Load the original architecture, will need to replace with your own auth token
	model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)

	# Path to the downloaded pytorch model
	model_path = "models/pyannote_sd_normal"

	# Load fine-tuned weights from the pytorch_model.bin file
	model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))
	```
	### Load fine-tuned speaker diarization pipeline
	```python
	from pyannote.audio import Pipeline
	from pyannote.metrics.diarization import DiarizationErrorRate
	from pyannote.audio.pipelines import SpeakerDiarization

	# Initialize the pyannote pipeline, will need to replace with your own auth token
	pretrained_pipeline = Pipeline.from_pretrained(
	"pyannote/speaker-diarization-3.1",
	use_auth_token=True)

	finetuned_pipeline = SpeakerDiarization(
	segmentation=model,
	embedding=pretrained_pipeline.embedding,
	embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
	clustering=pretrained_pipeline.klustering,
	)

	# Load fine-tuned params into the pipeline
	finetuned_pipeline.load_params(model_path + "/config.yaml")
	```
	### GPU usage
	```
	if torch.cuda.is_available():
	gpu = torch.device("cuda")
	finetuned_pipeline.to(gpu)
	print("gpu: ", torch.cuda.get_device_name(gpu))
	```

	### Visualise diarization output
	```
	diarization = finetuned_pipeline("path/to/audio.wav")
	diarization
	```

	### View speaker turns, speaker ID, and time
	```
	for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
	print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")
	```

	## Citations

	```bibtex
	@inproceedings{Plaquet23,
	author={Alexis Plaquet and Hervé Bredin},
	title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
	year=2023,
	booktitle={Proc. INTERSPEECH 2023},
	}
	```

	```bibtex
	@inproceedings{Bredin23,
	author={Hervé Bredin},
	title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
	year=2023,
	booktitle={Proc. INTERSPEECH 2023},
	}
	```