Improve model card: Fix git clone typo, add citation, and enhance description

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +40 -20
README.md CHANGED
@@ -5,6 +5,7 @@ datasets:
5
  - mozilla-foundation/common_voice_11_0
6
  language:
7
  - es
 
8
  license: openrail
9
  metrics:
10
  - accuracy
@@ -13,25 +14,25 @@ tags:
13
  - model_hub_mixin
14
  - pytorch_model_hub_mixin
15
  - speaker_dialect_classification
16
- library_name: transformers
17
  ---
18
 
19
  # Whisper-Large v3 for Spanish Dialect Classification
20
 
21
  # Model Description
22
- This model includes the implementation of Spanish dialect classification described in <a href="https://arxiv.org/abs/2508.01691"><strong>**Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe**</strong></a>
23
 
 
24
  Github repository: https://github.com/tiantiaf0627/voxlect
25
 
26
- The included Spanish dialects are:
27
  ```
28
  [
29
- "Andino-Pacífico",
30
- "Caribe and Central",
31
  "Chileno",
32
- "Mexican",
33
- "Penisular",
34
- "Rioplatense",
35
  ]
36
  ```
37
 
@@ -39,7 +40,7 @@ The included Spanish dialects are:
39
 
40
  ## Download repo
41
  ```bash
42
- git clone [email protected]:tiantiaf0627/voxvoxlect
43
  ```
44
  ## Install the package
45
  ```bash
@@ -67,31 +68,50 @@ model.eval()
67
  ```python
68
  # Label List
69
  dialect_list = [
70
- "Andino-Pacífico",
71
- "Caribe and Central",
72
  "Chileno",
73
- "Mexican",
74
- "Penisular",
75
- "Rioplatense",
76
  ]
77
-
78
  # Load data, here just zeros as the example
79
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
80
  # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
81
  max_audio_length = 15 * 16000
82
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
83
  logits, embeddings = model(data, return_feature=True)
84
-
85
  # Probability and output
86
  dialect_prob = F.softmax(logits, dim=1)
87
  print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
88
  ```
89
 
90
- Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
91
-
92
- ## If you have any questions, please contact: Tiantian Feng ([email protected])
93
 
94
  ❌ **Out-of-Scope Use**
95
  - Clinical or diagnostic applications
96
  - Surveillance
97
- - Privacy-invasive applications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - mozilla-foundation/common_voice_11_0
6
  language:
7
  - es
8
+ library_name: transformers
9
  license: openrail
10
  metrics:
11
  - accuracy
 
14
  - model_hub_mixin
15
  - pytorch_model_hub_mixin
16
  - speaker_dialect_classification
 
17
  ---
18
 
19
  # Whisper-Large v3 for Spanish Dialect Classification
20
 
21
  # Model Description
22
+ This model, based on OpenAI's Whisper-Large v3, is fine-tuned for Spanish dialect classification. It is part of the **Voxlect** benchmark, a novel initiative for modeling dialects and regional languages worldwide using speech foundation models. The Voxlect project conducts comprehensive benchmark evaluations on a wide range of languages and dialects, utilizing over 2 million training utterances from 30 publicly available speech corpora. This specific model provides classification for Spanish dialects, as detailed below.
23
 
24
+ Paper: [Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe](https://arxiv.org/abs/2508.01691)
25
  Github repository: https://github.com/tiantiaf0627/voxlect
26
 
27
+ The included Spanish dialects are:
28
  ```
29
  [
30
+ "Andino-Pacífico",
31
+ "Caribe and Central",
32
  "Chileno",
33
+ "Mexican",
34
+ "Penisular",
35
+ "Rioplatense",
36
  ]
37
  ```
38
 
 
40
 
41
  ## Download repo
42
  ```bash
43
+ git clone [email protected]:tiantiaf0627/voxlect.git
44
  ```
45
  ## Install the package
46
  ```bash
 
68
  ```python
69
  # Label List
70
  dialect_list = [
71
+ "Andino-Pacífico",
72
+ "Caribe and Central",
73
  "Chileno",
74
+ "Mexican",
75
+ "Penisular",
76
+ "Rioplatense",
77
  ]
78
+
79
  # Load data, here just zeros as the example
80
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
81
  # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
82
  max_audio_length = 15 * 16000
83
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
84
  logits, embeddings = model(data, return_feature=True)
85
+
86
  # Probability and output
87
  dialect_prob = F.softmax(logits, dim=1)
88
  print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
89
  ```
90
 
91
+ # Responsible Use
92
+ Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
 
93
 
94
  ❌ **Out-of-Scope Use**
95
  - Clinical or diagnostic applications
96
  - Surveillance
97
+ - Privacy-invasive applications
98
+
99
+ # Citation
100
+ If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!
101
+ ```bibtex
102
+ @article{feng2025voxlect,
103
+ title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
104
+ author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
105
+ year={2025}
106
+ }
107
+
108
+ @article{feng2025vox,
109
+ title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
110
+ author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
111
+ journal={arXiv preprint arXiv:2505.14648},
112
+ year={2025}
113
+ }
114
+ ```
115
+
116
+ ## Contact
117
+ If you have any questions, please contact: Tiantian Feng ([email protected])