nvidia
/

diar_sortformer_4spk-v1

Audio Classification

speaker-diarization

speaker-recognition

Model card Files Files and versions

taejinp commited on Dec 18, 2024

Commit

d2207a6

·

verified ·

1 Parent(s): ca79a82

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -196,7 +196,7 @@ This model accepts single-channel (mono) audio sampled at 16,000 Hz.
 The output of the model is a T x S matrix, where:
 - S is the maximum number of speakers (in this model, S = 4).
 - T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio.
-Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range.  For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds.
 ## Train and evaluate Sortformer diarizer using NeMo

 The output of the model is a T x S matrix, where:
 - S is the maximum number of speakers (in this model, S = 4).
 - T is the total number of frames, including zero-padding. Each frame corresponds to a segment of 0.08 seconds of audio.
+- Each element of the T x S matrix represents the speaker activity probability in the [0, 1] range.  For example, a matrix element a(150, 2) = 0.95 indicates a 95% probability of activity for the second speaker during the time range [12.00, 12.08] seconds.
 ## Train and evaluate Sortformer diarizer using NeMo