Add documentation for Diarization
Browse files- docs/options.md +19 -0
docs/options.md
CHANGED
|
@@ -80,6 +80,17 @@ number of seconds after the line has finished. For instance, if a line ends at 1
|
|
| 80 |
Note that detected lines in gaps between speech sections will not be included in the prompt
|
| 81 |
(if silero-vad or silero-vad-expand-into-gaps) is used.
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
# Command Line Options
|
| 84 |
|
| 85 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
|
@@ -132,3 +143,11 @@ If the average log probability is lower than this value, treat the decoding as f
|
|
| 132 |
|
| 133 |
## No speech threshold
|
| 134 |
If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
Note that detected lines in gaps between speech sections will not be included in the prompt
|
| 81 |
(if silero-vad or silero-vad-expand-into-gaps) is used.
|
| 82 |
|
| 83 |
+
## Diarization
|
| 84 |
+
|
| 85 |
+
If checked, Pyannote will be used to detect speakers in the audio, and label them as (SPEAKER 00), (SPEAKER 01), etc.
|
| 86 |
+
|
| 87 |
+
This requires a HuggingFace API key to function, which can be supplied with the `--auth_token` command line option for the CLI,
|
| 88 |
+
set in the `config.json5` file for the GUI, or provided via the `HK_AUTH_TOKEN` environment variable.
|
| 89 |
+
|
| 90 |
+
## Diarization - Speakers
|
| 91 |
+
|
| 92 |
+
The number of speakers to detect. If set to 0, Pyannote will attempt to detect the number of speakers automatically.
|
| 93 |
+
|
| 94 |
# Command Line Options
|
| 95 |
|
| 96 |
Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
|
|
|
|
| 143 |
|
| 144 |
## No speech threshold
|
| 145 |
If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
|
| 146 |
+
|
| 147 |
+
## Diarization - Min Speakers
|
| 148 |
+
|
| 149 |
+
The minimum number of speakers for Pyannote to detect.
|
| 150 |
+
|
| 151 |
+
## Diarization - Max Speakers
|
| 152 |
+
|
| 153 |
+
The maximum number of speakers for Pyannote to detect.
|