ggmbr commited on
Commit
2dc845f
·
1 Parent(s): 8822d54

documentation

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -13,7 +13,7 @@ datasets:
13
  - VoxCeleb
14
  ---
15
 
16
- # Non-timbral Embeddings extractor
17
  This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
18
  in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
19
  The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
@@ -67,6 +67,7 @@ Please note that the EER value can vary a little depending on the max_size defin
67
 
68
  # Limitations
69
  The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
 
70
 
71
  # Publication
72
  Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled
 
13
  - VoxCeleb
14
  ---
15
 
16
+ # Timbral Embeddings extractor
17
  This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
18
  in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
19
  The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
 
67
 
68
  # Limitations
69
  The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
70
+ The performance may also vary with the audio quality (recording device, background noise, ...), specially for audio qualities not covered by the training set, as no specific algorithm, e.g. data augmentation, was used during training to tackle this problem.
71
 
72
  # Publication
73
  Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled