Orange
/

Speaker-wavLM-tbr

🇪🇺 Region: EU

Model card Files Files and versions Community

ggmbr commited on Feb 10

Commit

2dc845f

·

1 Parent(s): 8822d54

documentation

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ datasets:
 - VoxCeleb
 ---
-# Non-timbral Embeddings extractor
 This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
 in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
 The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
@@ -67,6 +67,7 @@ Please note that the EER value can vary a little depending on the max_size defin
 # Limitations
 The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
 # Publication
 Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled

 - VoxCeleb
 ---
+# Timbral Embeddings extractor
 This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
 in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
 The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
 # Limitations
 The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
+The performance may also vary with the audio quality (recording device, background noise, ...), specially for audio qualities not covered by the training set, as no specific algorithm, e.g. data augmentation, was used during training to tackle this problem.
 # Publication
 Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled