Improve model card: Fix git clone typo, add citation, and enhance description
Browse filesThis PR significantly improves the model card by:
* **Correcting the `git clone` command:** Fixed the typo in the example command from `voxvoxlect` to `voxlect.git`, ensuring users can correctly clone the repository.
* **Adding comprehensive citation information:** Included the BibTeX entries for the relevant research papers (`Voxlect` and `Vox-Profile`), which is crucial for proper academic attribution.
* **Enhancing the model description:** Expanded the introductory section to provide more context about the Voxlect benchmark and the model's specific role in Spanish dialect classification, drawing information from the paper abstract and the main GitHub README.
These changes make the model card more accurate, complete, and user-friendly for researchers and practitioners.
@@ -5,6 +5,7 @@ datasets:
|
|
5 |
- mozilla-foundation/common_voice_11_0
|
6 |
language:
|
7 |
- es
|
|
|
8 |
license: openrail
|
9 |
metrics:
|
10 |
- accuracy
|
@@ -13,25 +14,25 @@ tags:
|
|
13 |
- model_hub_mixin
|
14 |
- pytorch_model_hub_mixin
|
15 |
- speaker_dialect_classification
|
16 |
-
library_name: transformers
|
17 |
---
|
18 |
|
19 |
# Whisper-Large v3 for Spanish Dialect Classification
|
20 |
|
21 |
# Model Description
|
22 |
-
This model
|
23 |
|
|
|
24 |
Github repository: https://github.com/tiantiaf0627/voxlect
|
25 |
|
26 |
-
The included Spanish dialects are:
|
27 |
```
|
28 |
[
|
29 |
-
"Andino-Pacífico",
|
30 |
-
"Caribe and Central",
|
31 |
"Chileno",
|
32 |
-
"Mexican",
|
33 |
-
"Penisular",
|
34 |
-
"Rioplatense",
|
35 |
]
|
36 |
```
|
37 |
|
@@ -39,7 +40,7 @@ The included Spanish dialects are:
|
|
39 |
|
40 |
## Download repo
|
41 |
```bash
|
42 |
-
git clone [email protected]:tiantiaf0627/
|
43 |
```
|
44 |
## Install the package
|
45 |
```bash
|
@@ -67,31 +68,50 @@ model.eval()
|
|
67 |
```python
|
68 |
# Label List
|
69 |
dialect_list = [
|
70 |
-
"Andino-Pacífico",
|
71 |
-
"Caribe and Central",
|
72 |
"Chileno",
|
73 |
-
"Mexican",
|
74 |
-
"Penisular",
|
75 |
-
"Rioplatense",
|
76 |
]
|
77 |
-
|
78 |
# Load data, here just zeros as the example
|
79 |
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
|
80 |
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
|
81 |
max_audio_length = 15 * 16000
|
82 |
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
|
83 |
logits, embeddings = model(data, return_feature=True)
|
84 |
-
|
85 |
# Probability and output
|
86 |
dialect_prob = F.softmax(logits, dim=1)
|
87 |
print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
|
88 |
```
|
89 |
|
90 |
-
Responsible Use
|
91 |
-
|
92 |
-
## If you have any questions, please contact: Tiantian Feng ([email protected])
|
93 |
|
94 |
❌ **Out-of-Scope Use**
|
95 |
- Clinical or diagnostic applications
|
96 |
- Surveillance
|
97 |
-
- Privacy-invasive applications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- mozilla-foundation/common_voice_11_0
|
6 |
language:
|
7 |
- es
|
8 |
+
library_name: transformers
|
9 |
license: openrail
|
10 |
metrics:
|
11 |
- accuracy
|
|
|
14 |
- model_hub_mixin
|
15 |
- pytorch_model_hub_mixin
|
16 |
- speaker_dialect_classification
|
|
|
17 |
---
|
18 |
|
19 |
# Whisper-Large v3 for Spanish Dialect Classification
|
20 |
|
21 |
# Model Description
|
22 |
+
This model, based on OpenAI's Whisper-Large v3, is fine-tuned for Spanish dialect classification. It is part of the **Voxlect** benchmark, a novel initiative for modeling dialects and regional languages worldwide using speech foundation models. The Voxlect project conducts comprehensive benchmark evaluations on a wide range of languages and dialects, utilizing over 2 million training utterances from 30 publicly available speech corpora. This specific model provides classification for Spanish dialects, as detailed below.
|
23 |
|
24 |
+
Paper: [Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe](https://arxiv.org/abs/2508.01691)
|
25 |
Github repository: https://github.com/tiantiaf0627/voxlect
|
26 |
|
27 |
+
The included Spanish dialects are:
|
28 |
```
|
29 |
[
|
30 |
+
"Andino-Pacífico",
|
31 |
+
"Caribe and Central",
|
32 |
"Chileno",
|
33 |
+
"Mexican",
|
34 |
+
"Penisular",
|
35 |
+
"Rioplatense",
|
36 |
]
|
37 |
```
|
38 |
|
|
|
40 |
|
41 |
## Download repo
|
42 |
```bash
|
43 |
+
git clone [email protected]:tiantiaf0627/voxlect.git
|
44 |
```
|
45 |
## Install the package
|
46 |
```bash
|
|
|
68 |
```python
|
69 |
# Label List
|
70 |
dialect_list = [
|
71 |
+
"Andino-Pacífico",
|
72 |
+
"Caribe and Central",
|
73 |
"Chileno",
|
74 |
+
"Mexican",
|
75 |
+
"Penisular",
|
76 |
+
"Rioplatense",
|
77 |
]
|
78 |
+
|
79 |
# Load data, here just zeros as the example
|
80 |
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
|
81 |
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
|
82 |
max_audio_length = 15 * 16000
|
83 |
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
|
84 |
logits, embeddings = model(data, return_feature=True)
|
85 |
+
|
86 |
# Probability and output
|
87 |
dialect_prob = F.softmax(logits, dim=1)
|
88 |
print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
|
89 |
```
|
90 |
|
91 |
+
# Responsible Use
|
92 |
+
Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
|
|
|
93 |
|
94 |
❌ **Out-of-Scope Use**
|
95 |
- Clinical or diagnostic applications
|
96 |
- Surveillance
|
97 |
+
- Privacy-invasive applications
|
98 |
+
|
99 |
+
# Citation
|
100 |
+
If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!
|
101 |
+
```bibtex
|
102 |
+
@article{feng2025voxlect,
|
103 |
+
title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
|
104 |
+
author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
|
105 |
+
year={2025}
|
106 |
+
}
|
107 |
+
|
108 |
+
@article{feng2025vox,
|
109 |
+
title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
|
110 |
+
author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
|
111 |
+
journal={arXiv preprint arXiv:2505.14648},
|
112 |
+
year={2025}
|
113 |
+
}
|
114 |
+
```
|
115 |
+
|
116 |
+
## Contact
|
117 |
+
If you have any questions, please contact: Tiantian Feng ([email protected])
|