ggmbr commited on
Commit
7c7793f
·
1 Parent(s): 2dc845f
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -31,8 +31,8 @@ Its weights are then downloaded from this repository.
31
  from spk_embeddings import EmbeddingsModel, compute_embedding
32
  import torch
33
 
34
- nt_extractor = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-tbr")
35
- nt_extractor.eval()
36
  ```
37
 
38
  The model produces normalized vectors as embeddings.
@@ -48,8 +48,8 @@ finally, we can compute two embeddings from two different files and compare them
48
  wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
49
  wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
50
 
51
- e1 = compute_embedding(wav1, nt_extractor)
52
- e2 = compute_embedding(wav2, nt_extractor)
53
  sim = float(torch.matmul(e1,e2.t()))
54
 
55
  print(sim) # 0.7743815779685974
@@ -58,8 +58,8 @@ print(sim) # 0.7743815779685974
58
  # Evaluations
59
  Although it is not directly designed for this use case, evaluation on a standard ASV task can be performed with this model. Applied to
60
  the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt), it leads to an equal error rate
61
- (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **10.681%**
62
- (with a decision threshold of **0.467**).
63
  This value can be interpreted as the ability to identify speakers only with timbral cues. A discussion about this interpretation can be
64
  found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
65
 
 
31
  from spk_embeddings import EmbeddingsModel, compute_embedding
32
  import torch
33
 
34
+ model = EmbeddingsModel.from_pretrained("Orange/Speaker-wavLM-tbr")
35
+ model.eval()
36
  ```
37
 
38
  The model produces normalized vectors as embeddings.
 
48
  wav1 = "/voxceleb1_2019/test/wav/id10270/x6uYqmx31kE/00001.wav"
49
  wav2 = "/voxceleb1_2019/test/wav/id10270/8jEAjG6SegY/00008.wav"
50
 
51
+ e1 = compute_embedding(wav1, model)
52
+ e2 = compute_embedding(wav2, model)
53
  sim = float(torch.matmul(e1,e2.t()))
54
 
55
  print(sim) # 0.7743815779685974
 
58
  # Evaluations
59
  Although it is not directly designed for this use case, evaluation on a standard ASV task can be performed with this model. Applied to
60
  the [VoxCeleb1-clean test set](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test2.txt), it leads to an equal error rate
61
+ (EER, lower value denotes a better identification, random prediction leads to a value of 50%) of **1.685%**
62
+ (with a decision threshold of **0.472**).
63
  This value can be interpreted as the ability to identify speakers only with timbral cues. A discussion about this interpretation can be
64
  found in the paper mentioned hereafter, as well as other experiments showing correlations between these embeddings and timbral voice attributes.
65