nielsr HF Staff commited on
Commit
c7d9958
·
verified ·
1 Parent(s): 17edea0

Improve model card: Fix git clone typo, add citation, and enhance description

Browse files

This PR significantly improves the model card by:

* **Correcting the `git clone` command:** Fixed the typo in the example command from `voxvoxlect` to `voxlect.git`, ensuring users can correctly clone the repository.
* **Adding comprehensive citation information:** Included the BibTeX entries for the relevant research papers (`Voxlect` and `Vox-Profile`), which is crucial for proper academic attribution.
* **Enhancing the model description:** Expanded the introductory section to provide more context about the Voxlect benchmark and the model's specific role in Spanish dialect classification, drawing information from the paper abstract and the main GitHub README.

These changes make the model card more accurate, complete, and user-friendly for researchers and practitioners.

Files changed (1) hide show
  1. README.md +40 -20
README.md CHANGED
@@ -5,6 +5,7 @@ datasets:
5
  - mozilla-foundation/common_voice_11_0
6
  language:
7
  - es
 
8
  license: openrail
9
  metrics:
10
  - accuracy
@@ -13,25 +14,25 @@ tags:
13
  - model_hub_mixin
14
  - pytorch_model_hub_mixin
15
  - speaker_dialect_classification
16
- library_name: transformers
17
  ---
18
 
19
  # Whisper-Large v3 for Spanish Dialect Classification
20
 
21
  # Model Description
22
- This model includes the implementation of Spanish dialect classification described in <a href="https://arxiv.org/abs/2508.01691"><strong>**Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe**</strong></a>
23
 
 
24
  Github repository: https://github.com/tiantiaf0627/voxlect
25
 
26
- The included Spanish dialects are:
27
  ```
28
  [
29
- "Andino-Pacífico",
30
- "Caribe and Central",
31
  "Chileno",
32
- "Mexican",
33
- "Penisular",
34
- "Rioplatense",
35
  ]
36
  ```
37
 
@@ -39,7 +40,7 @@ The included Spanish dialects are:
39
 
40
  ## Download repo
41
  ```bash
42
- git clone [email protected]:tiantiaf0627/voxvoxlect
43
  ```
44
  ## Install the package
45
  ```bash
@@ -67,31 +68,50 @@ model.eval()
67
  ```python
68
  # Label List
69
  dialect_list = [
70
- "Andino-Pacífico",
71
- "Caribe and Central",
72
  "Chileno",
73
- "Mexican",
74
- "Penisular",
75
- "Rioplatense",
76
  ]
77
-
78
  # Load data, here just zeros as the example
79
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
80
  # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
81
  max_audio_length = 15 * 16000
82
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
83
  logits, embeddings = model(data, return_feature=True)
84
-
85
  # Probability and output
86
  dialect_prob = F.softmax(logits, dim=1)
87
  print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
88
  ```
89
 
90
- Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
91
-
92
- ## If you have any questions, please contact: Tiantian Feng ([email protected])
93
 
94
  ❌ **Out-of-Scope Use**
95
  - Clinical or diagnostic applications
96
  - Surveillance
97
- - Privacy-invasive applications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - mozilla-foundation/common_voice_11_0
6
  language:
7
  - es
8
+ library_name: transformers
9
  license: openrail
10
  metrics:
11
  - accuracy
 
14
  - model_hub_mixin
15
  - pytorch_model_hub_mixin
16
  - speaker_dialect_classification
 
17
  ---
18
 
19
  # Whisper-Large v3 for Spanish Dialect Classification
20
 
21
  # Model Description
22
+ This model, based on OpenAI's Whisper-Large v3, is fine-tuned for Spanish dialect classification. It is part of the **Voxlect** benchmark, a novel initiative for modeling dialects and regional languages worldwide using speech foundation models. The Voxlect project conducts comprehensive benchmark evaluations on a wide range of languages and dialects, utilizing over 2 million training utterances from 30 publicly available speech corpora. This specific model provides classification for Spanish dialects, as detailed below.
23
 
24
+ Paper: [Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe](https://arxiv.org/abs/2508.01691)
25
  Github repository: https://github.com/tiantiaf0627/voxlect
26
 
27
+ The included Spanish dialects are:
28
  ```
29
  [
30
+ "Andino-Pacífico",
31
+ "Caribe and Central",
32
  "Chileno",
33
+ "Mexican",
34
+ "Penisular",
35
+ "Rioplatense",
36
  ]
37
  ```
38
 
 
40
 
41
  ## Download repo
42
  ```bash
43
+ git clone [email protected]:tiantiaf0627/voxlect.git
44
  ```
45
  ## Install the package
46
  ```bash
 
68
  ```python
69
  # Label List
70
  dialect_list = [
71
+ "Andino-Pacífico",
72
+ "Caribe and Central",
73
  "Chileno",
74
+ "Mexican",
75
+ "Penisular",
76
+ "Rioplatense",
77
  ]
78
+
79
  # Load data, here just zeros as the example
80
  # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
81
  # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
82
  max_audio_length = 15 * 16000
83
  data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
84
  logits, embeddings = model(data, return_feature=True)
85
+
86
  # Probability and output
87
  dialect_prob = F.softmax(logits, dim=1)
88
  print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
89
  ```
90
 
91
+ # Responsible Use
92
+ Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
 
93
 
94
  ❌ **Out-of-Scope Use**
95
  - Clinical or diagnostic applications
96
  - Surveillance
97
+ - Privacy-invasive applications
98
+
99
+ # Citation
100
+ If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!
101
+ ```bibtex
102
+ @article{feng2025voxlect,
103
+ title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
104
+ author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
105
+ year={2025}
106
+ }
107
+
108
+ @article{feng2025vox,
109
+ title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
110
+ author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
111
+ journal={arXiv preprint arXiv:2505.14648},
112
+ year={2025}
113
+ }
114
+ ```
115
+
116
+ ## Contact
117
+ If you have any questions, please contact: Tiantian Feng ([email protected])