NeMo
popcornell commited on
Commit
d66d826
1 Parent(s): 3b6d157

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ on [VoxCeleb1&2 datasets](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.ht
25
  - MSDD Reference: [Park et al. (2022)](https://arxiv.org/pdf/2203.15974.pdf)
26
  - MSDD-v2 speaker diarization system employs a multi-scale embedding approach and utilizes TitaNet speaker embedding extractor.
27
  - TitaNet Reference: [Koluguri et al. (2022)](https://arxiv.org/abs/2110.04410)
28
- - TitaNet Model is included in [MSDD-v2 .nemo checkpoint file]((https://huggingface.co/chime-dasr/nemo_baseline_models/blob/main/MSDD_v2_PALO_100ms_intrpl_3scales.nemo)).
29
  - Unlike the system that uses a multi-layer LSTM architecture, we employ a four-layer Transformer architecture with a hidden size of 384.
30
  - This neural model generates logit values indicating speaker existence.
31
  - Our diarization model is trained on approximately 3,000 hours of simulated audio mixture data from the same multi-speaker data simulator used in VAD model training, drawing from VoxCeleb1&2 and LibriSpeech datasets.
 
25
  - MSDD Reference: [Park et al. (2022)](https://arxiv.org/pdf/2203.15974.pdf)
26
  - MSDD-v2 speaker diarization system employs a multi-scale embedding approach and utilizes TitaNet speaker embedding extractor.
27
  - TitaNet Reference: [Koluguri et al. (2022)](https://arxiv.org/abs/2110.04410)
28
+ - TitaNet Model is included in [MSDD-v2 .nemo checkpoint file](https://huggingface.co/chime-dasr/nemo_baseline_models/blob/main/MSDD_v2_PALO_100ms_intrpl_3scales.nemo).
29
  - Unlike the system that uses a multi-layer LSTM architecture, we employ a four-layer Transformer architecture with a hidden size of 384.
30
  - This neural model generates logit values indicating speaker existence.
31
  - Our diarization model is trained on approximately 3,000 hours of simulated audio mixture data from the same multi-speaker data simulator used in VAD model training, drawing from VoxCeleb1&2 and LibriSpeech datasets.