documentation
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ datasets:
|
|
13 |
- VoxCeleb
|
14 |
---
|
15 |
|
16 |
-
#
|
17 |
This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
|
18 |
in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
|
19 |
The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
|
@@ -67,6 +67,7 @@ Please note that the EER value can vary a little depending on the max_size defin
|
|
67 |
|
68 |
# Limitations
|
69 |
The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
|
|
|
70 |
|
71 |
# Publication
|
72 |
Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled
|
|
|
13 |
- VoxCeleb
|
14 |
---
|
15 |
|
16 |
+
# Timbral Embeddings extractor
|
17 |
This model produces embeddings that globally represent the timbral traits of a speaker's voice. These embeddings can be used the same way as for a classical speaker verification (ASV):
|
18 |
in order to compare two voice signals, an embeddings vector must be computed for each of them. Then the cosine similarity between the two embeddings can be used for comparison.
|
19 |
The main difference with classical ASV embeddings is that, here, only the timbral traits are compared.
|
|
|
67 |
|
68 |
# Limitations
|
69 |
The fine tuning data used to produce this model (VoxCeleb, VCTK) are mostly in english, which may affect the performance on other languages.
|
70 |
+
The performance may also vary with the audio quality (recording device, background noise, ...), specially for audio qualities not covered by the training set, as no specific algorithm, e.g. data augmentation, was used during training to tackle this problem.
|
71 |
|
72 |
# Publication
|
73 |
Details about the method used to build this model have been published at Interspeech 2024 in the paper entitled
|