Improve model card: Fix git clone typo, add citation, and enhance description
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -5,6 +5,7 @@ datasets:
|
|
5 |
- mozilla-foundation/common_voice_11_0
|
6 |
language:
|
7 |
- es
|
|
|
8 |
license: openrail
|
9 |
metrics:
|
10 |
- accuracy
|
@@ -13,25 +14,25 @@ tags:
|
|
13 |
- model_hub_mixin
|
14 |
- pytorch_model_hub_mixin
|
15 |
- speaker_dialect_classification
|
16 |
-
library_name: transformers
|
17 |
---
|
18 |
|
19 |
# Whisper-Large v3 for Spanish Dialect Classification
|
20 |
|
21 |
# Model Description
|
22 |
-
This model
|
23 |
|
|
|
24 |
Github repository: https://github.com/tiantiaf0627/voxlect
|
25 |
|
26 |
-
The included Spanish dialects are:
|
27 |
```
|
28 |
[
|
29 |
-
"Andino-Pacífico",
|
30 |
-
"Caribe and Central",
|
31 |
"Chileno",
|
32 |
-
"Mexican",
|
33 |
-
"Penisular",
|
34 |
-
"Rioplatense",
|
35 |
]
|
36 |
```
|
37 |
|
@@ -39,7 +40,7 @@ The included Spanish dialects are:
|
|
39 |
|
40 |
## Download repo
|
41 |
```bash
|
42 |
-
git clone [email protected]:tiantiaf0627/
|
43 |
```
|
44 |
## Install the package
|
45 |
```bash
|
@@ -67,31 +68,50 @@ model.eval()
|
|
67 |
```python
|
68 |
# Label List
|
69 |
dialect_list = [
|
70 |
-
"Andino-Pacífico",
|
71 |
-
"Caribe and Central",
|
72 |
"Chileno",
|
73 |
-
"Mexican",
|
74 |
-
"Penisular",
|
75 |
-
"Rioplatense",
|
76 |
]
|
77 |
-
|
78 |
# Load data, here just zeros as the example
|
79 |
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
|
80 |
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
|
81 |
max_audio_length = 15 * 16000
|
82 |
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
|
83 |
logits, embeddings = model(data, return_feature=True)
|
84 |
-
|
85 |
# Probability and output
|
86 |
dialect_prob = F.softmax(logits, dim=1)
|
87 |
print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
|
88 |
```
|
89 |
|
90 |
-
Responsible Use
|
91 |
-
|
92 |
-
## If you have any questions, please contact: Tiantian Feng ([email protected])
|
93 |
|
94 |
❌ **Out-of-Scope Use**
|
95 |
- Clinical or diagnostic applications
|
96 |
- Surveillance
|
97 |
-
- Privacy-invasive applications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- mozilla-foundation/common_voice_11_0
|
6 |
language:
|
7 |
- es
|
8 |
+
library_name: transformers
|
9 |
license: openrail
|
10 |
metrics:
|
11 |
- accuracy
|
|
|
14 |
- model_hub_mixin
|
15 |
- pytorch_model_hub_mixin
|
16 |
- speaker_dialect_classification
|
|
|
17 |
---
|
18 |
|
19 |
# Whisper-Large v3 for Spanish Dialect Classification
|
20 |
|
21 |
# Model Description
|
22 |
+
This model, based on OpenAI's Whisper-Large v3, is fine-tuned for Spanish dialect classification. It is part of the **Voxlect** benchmark, a novel initiative for modeling dialects and regional languages worldwide using speech foundation models. The Voxlect project conducts comprehensive benchmark evaluations on a wide range of languages and dialects, utilizing over 2 million training utterances from 30 publicly available speech corpora. This specific model provides classification for Spanish dialects, as detailed below.
|
23 |
|
24 |
+
Paper: [Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe](https://arxiv.org/abs/2508.01691)
|
25 |
Github repository: https://github.com/tiantiaf0627/voxlect
|
26 |
|
27 |
+
The included Spanish dialects are:
|
28 |
```
|
29 |
[
|
30 |
+
"Andino-Pacífico",
|
31 |
+
"Caribe and Central",
|
32 |
"Chileno",
|
33 |
+
"Mexican",
|
34 |
+
"Penisular",
|
35 |
+
"Rioplatense",
|
36 |
]
|
37 |
```
|
38 |
|
|
|
40 |
|
41 |
## Download repo
|
42 |
```bash
|
43 |
+
git clone [email protected]:tiantiaf0627/voxlect.git
|
44 |
```
|
45 |
## Install the package
|
46 |
```bash
|
|
|
68 |
```python
|
69 |
# Label List
|
70 |
dialect_list = [
|
71 |
+
"Andino-Pacífico",
|
72 |
+
"Caribe and Central",
|
73 |
"Chileno",
|
74 |
+
"Mexican",
|
75 |
+
"Penisular",
|
76 |
+
"Rioplatense",
|
77 |
]
|
78 |
+
|
79 |
# Load data, here just zeros as the example
|
80 |
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
|
81 |
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
|
82 |
max_audio_length = 15 * 16000
|
83 |
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
|
84 |
logits, embeddings = model(data, return_feature=True)
|
85 |
+
|
86 |
# Probability and output
|
87 |
dialect_prob = F.softmax(logits, dim=1)
|
88 |
print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
|
89 |
```
|
90 |
|
91 |
+
# Responsible Use
|
92 |
+
Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
|
|
|
93 |
|
94 |
❌ **Out-of-Scope Use**
|
95 |
- Clinical or diagnostic applications
|
96 |
- Surveillance
|
97 |
+
- Privacy-invasive applications
|
98 |
+
|
99 |
+
# Citation
|
100 |
+
If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!
|
101 |
+
```bibtex
|
102 |
+
@article{feng2025voxlect,
|
103 |
+
title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
|
104 |
+
author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
|
105 |
+
year={2025}
|
106 |
+
}
|
107 |
+
|
108 |
+
@article{feng2025vox,
|
109 |
+
title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
|
110 |
+
author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
|
111 |
+
journal={arXiv preprint arXiv:2505.14648},
|
112 |
+
year={2025}
|
113 |
+
}
|
114 |
+
```
|
115 |
+
|
116 |
+
## Contact
|
117 |
+
If you have any questions, please contact: Tiantian Feng ([email protected])
|