Update README.md
Browse files
README.md
CHANGED
@@ -196,7 +196,7 @@ This model accepts single-channel (mono) audio sampled at 16,000 Hz.
|
|
196 |
The output of the model is a T x S matrix, where:
|
197 |
- S is the maximum number of speakers (in this model, S = 4).
|
198 |
- T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio.
|
199 |
-
Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range. For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds.
|
200 |
|
201 |
|
202 |
## Train and evaluate Sortformer diarizer using NeMo
|
|
|
196 |
The output of the model is a T x S matrix, where:
|
197 |
- S is the maximum number of speakers (in this model, S = 4).
|
198 |
- T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio.
|
199 |
+
- Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range. For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds.
|
200 |
|
201 |
|
202 |
## Train and evaluate Sortformer diarizer using NeMo
|