yangwang825 commited on
Commit
770dac0
·
verified ·
1 Parent(s): 15387ce

Model save

Browse files
Files changed (3) hide show
  1. README.md +18 -32
  2. model.safetensors +1 -1
  3. tdnn_attention.py +12 -7
README.md CHANGED
@@ -1,28 +1,14 @@
1
  ---
 
 
 
2
  datasets:
3
  - voxceleb
4
- library_name: transformers
5
  metrics:
6
  - accuracy
7
- tags:
8
- - audio-classification
9
- - generated_from_trainer
10
  model-index:
11
  - name: ecapa-tdnn-voxceleb1-c512-aam
12
- results:
13
- - task:
14
- type: audio-classification
15
- name: Audio Classification
16
- dataset:
17
- name: confit/voxceleb
18
- type: voxceleb
19
- config: verification
20
- split: train
21
- args: verification
22
- metrics:
23
- - type: accuracy
24
- value: 0.8030272452068618
25
- name: Accuracy
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -30,10 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
30
 
31
  # ecapa-tdnn-voxceleb1-c512-aam
32
 
33
- This model is a fine-tuned version of [](https://huggingface.co/) on the confit/voxceleb dataset.
34
  It achieves the following results on the evaluation set:
35
- - Loss: 4.7003
36
- - Accuracy: 0.8030
37
 
38
  ## Model description
39
 
@@ -52,7 +38,7 @@ More information needed
52
  ### Training hyperparameters
53
 
54
  The following hyperparameters were used during training:
55
- - learning_rate: 0.0001
56
  - train_batch_size: 256
57
  - eval_batch_size: 1
58
  - seed: 914
@@ -66,16 +52,16 @@ The following hyperparameters were used during training:
66
 
67
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
68
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
69
- | 11.3851 | 1.0 | 523 | 11.0293 | 0.1806 |
70
- | 9.7596 | 2.0 | 1046 | 9.1401 | 0.3850 |
71
- | 8.7136 | 3.0 | 1569 | 7.8821 | 0.5242 |
72
- | 7.848 | 4.0 | 2092 | 6.9451 | 0.6144 |
73
- | 7.1912 | 5.0 | 2615 | 6.2630 | 0.6821 |
74
- | 6.6763 | 6.0 | 3138 | 5.7182 | 0.7292 |
75
- | 6.3112 | 7.0 | 3661 | 5.2653 | 0.7632 |
76
- | 6.0255 | 8.0 | 4184 | 4.9663 | 0.7826 |
77
- | 5.8091 | 9.0 | 4707 | 4.7787 | 0.7957 |
78
- | 5.7269 | 10.0 | 5230 | 4.7003 | 0.8030 |
79
 
80
 
81
  ### Framework versions
 
1
  ---
2
+ library_name: transformers
3
+ tags:
4
+ - generated_from_trainer
5
  datasets:
6
  - voxceleb
 
7
  metrics:
8
  - accuracy
 
 
 
9
  model-index:
10
  - name: ecapa-tdnn-voxceleb1-c512-aam
11
+ results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
16
 
17
  # ecapa-tdnn-voxceleb1-c512-aam
18
 
19
+ This model is a fine-tuned version of [](https://huggingface.co/) on the voxceleb dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: nan
22
+ - Accuracy: 0.0007
23
 
24
  ## Model description
25
 
 
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
41
+ - learning_rate: 0.0005
42
  - train_batch_size: 256
43
  - eval_batch_size: 1
44
  - seed: 914
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
54
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
55
+ | 9.047 | 1.0 | 575 | 8.3662 | 0.4304 |
56
+ | 5.3508 | 2.0 | 1150 | 4.0252 | 0.8191 |
57
+ | 3.3124 | 3.0 | 1725 | 2.1083 | 0.9260 |
58
+ | 2.3212 | 4.0 | 2300 | 1.2224 | 0.9435 |
59
+ | 1.6276 | 5.0 | 2875 | 0.8229 | 0.9677 |
60
+ | 1.1418 | 6.0 | 3450 | 0.5840 | 0.9758 |
61
+ | 1.0484 | 7.0 | 4025 | 0.5781 | 0.9738 |
62
+ | 0.0 | 8.0 | 4600 | nan | 0.0007 |
63
+ | 0.0 | 9.0 | 5175 | nan | 0.0007 |
64
+ | 0.0 | 10.0 | 5750 | nan | 0.0007 |
65
 
66
 
67
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d5d7984fc542731fbba8c59c4d9af1ce1d3673bc91f68c6f88044a29373d6b6
3
  size 26039912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01ab04c49d211b9cb78118ce27d28f1f04761dd870ee7e94728be21790d620cd
3
  size 26039912
tdnn_attention.py CHANGED
@@ -214,16 +214,21 @@ class MaskedSEModule(nn.Module):
214
  nn.Sigmoid(),
215
  )
216
 
217
- def forward(self, input, length=None):
 
 
 
 
218
  if length is None:
219
- x = torch.mean(input, dim=2, keep_dim=True)
220
  else:
221
- max_len = input.size(2)
222
- mask, num_values = lens_to_mask(length, max_len=max_len, device=input.device)
223
- x = torch.sum((input * mask), dim=2, keepdim=True) / (num_values)
224
-
 
225
  out = self.se_layer(x)
226
- return out * input
227
 
228
 
229
  class TdnnSeModule(nn.Module):
 
214
  nn.Sigmoid(),
215
  )
216
 
217
+ def forward(self, inputs, length=None):
218
+ """
219
+ inputs: tensor shape of (B, D, T)
220
+ outputs: tensor shape of (B, D, 1)
221
+ """
222
  if length is None:
223
+ x = torch.mean(inputs, dim=2, keep_dim=True)
224
  else:
225
+ max_len = inputs.size(2)
226
+ # shape of `mask` is (B, 1, T) and shape of `num_values` is (B, 1, 1)
227
+ mask, num_values = lens_to_mask(length, max_len=max_len, device=inputs.device)
228
+ # shape of `x` is (B, D, 1)
229
+ x = torch.sum((inputs * mask), dim=2, keepdim=True) / (num_values)
230
  out = self.se_layer(x)
231
+ return out * inputs
232
 
233
 
234
  class TdnnSeModule(nn.Module):