Improve model card: Fix git clone typo, add citation, and enhance description

This PR significantly improves the model card by:

* **Correcting the `git clone` command:** Fixed the typo in the example command from `voxvoxlect` to `voxlect.git`, ensuring users can correctly clone the repository.
* **Adding comprehensive citation information:** Included the BibTeX entries for the relevant research papers (`Voxlect` and `Vox-Profile`), which is crucial for proper academic attribution.
* **Enhancing the model description:** Expanded the introductory section to provide more context about the Voxlect benchmark and the model's specific role in Spanish dialect classification, drawing information from the paper abstract and the main GitHub README.

These changes make the model card more accurate, complete, and user-friendly for researchers and practitioners.

Files changed (1) hide show

README.md +40 -20

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ datasets:
 - mozilla-foundation/common_voice_11_0
 language:
 - es
 license: openrail
 metrics:
 - accuracy
@@ -13,25 +14,25 @@ tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 - speaker_dialect_classification
-library_name: transformers
 ---
 # Whisper-Large v3 for Spanish Dialect Classification
 # Model Description
-This model includes the implementation of Spanish dialect classification described in <a href="https://arxiv.org/abs/2508.01691"><strong>**Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe**</strong></a>
 Github repository: https://github.com/tiantiaf0627/voxlect
-The included Spanish dialects are:
 ```
 [
-  "Andino-Pacífico",
-  "Caribe and Central",
   "Chileno",
-  "Mexican",
-  "Penisular",
-  "Rioplatense",
 ]
 ```
@@ -39,7 +40,7 @@ The included Spanish dialects are:
 ## Download repo
 ```bash
-git clone [email protected]:tiantiaf0627/voxvoxlect
 ```
 ## Install the package
 ```bash
@@ -67,31 +68,50 @@ model.eval()
 ```python
 # Label List
 dialect_list = [
-    "Andino-Pacífico",
-    "Caribe and Central",
     "Chileno",
-    "Mexican",
-    "Penisular",
-    "Rioplatense",
 ]
 # Load data, here just zeros as the example
 # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
 # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
 max_audio_length = 15 * 16000
 data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
 logits, embeddings = model(data, return_feature=True)
 # Probability and output
 dialect_prob = F.softmax(logits, dim=1)
 print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
 ```
-Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
-## If you have any questions, please contact: Tiantian Feng ([email protected])
 ❌ **Out-of-Scope Use**
 - Clinical or diagnostic applications
 - Surveillance
-- Privacy-invasive applications

 - mozilla-foundation/common_voice_11_0
 language:
 - es
+library_name: transformers
 license: openrail
 metrics:
 - accuracy
 - model_hub_mixin
 - pytorch_model_hub_mixin
 - speaker_dialect_classification
 ---
 # Whisper-Large v3 for Spanish Dialect Classification
 # Model Description
+This model, based on OpenAI's Whisper-Large v3, is fine-tuned for Spanish dialect classification. It is part of the **Voxlect** benchmark, a novel initiative for modeling dialects and regional languages worldwide using speech foundation models. The Voxlect project conducts comprehensive benchmark evaluations on a wide range of languages and dialects, utilizing over 2 million training utterances from 30 publicly available speech corpora. This specific model provides classification for Spanish dialects, as detailed below.
+Paper: [Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe](https://arxiv.org/abs/2508.01691)
 Github repository: https://github.com/tiantiaf0627/voxlect
+The included Spanish dialects are:
 ```
 [
+  "Andino-Pacífico",
+  "Caribe and Central",
   "Chileno",
+  "Mexican",
+  "Penisular",
+  "Rioplatense",
 ]
 ```
 ## Download repo
 ```bash
+git clone [email protected]:tiantiaf0627/voxlect.git
 ```
 ## Install the package
 ```bash
 ```python
 # Label List
 dialect_list = [
+    "Andino-Pacífico",
+    "Caribe and Central",
     "Chileno",
+    "Mexican",
+    "Penisular",
+    "Rioplatense",
 ]
 # Load data, here just zeros as the example
 # Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
 # So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
 max_audio_length = 15 * 16000
 data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
 logits, embeddings = model(data, return_feature=True)
 # Probability and output
 dialect_prob = F.softmax(logits, dim=1)
 print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])
 ```
+# Responsible Use
+Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.
 ❌ **Out-of-Scope Use**
 - Clinical or diagnostic applications
 - Surveillance
+- Privacy-invasive applications
+# Citation
+If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!
+```bibtex
+@article{feng2025voxlect,
+  title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
+  author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
+  year={2025}
+}
+@article{feng2025vox,
+  title={Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits},
+  author={Feng, Tiantian and Lee, Jihwan and Xu, Anfeng and Lee, Yoonjeong and Lertpetchpun, Thanathai and Shi, Xuan and Wang, Helin and Thebaud, Thomas and Moro-Velazquez, Laureano and Byrd, Dani and others},
+  journal={arXiv preprint arXiv:2505.14648},
+  year={2025}
+}
+```
+## Contact
+If you have any questions, please contact: Tiantian Feng ([email protected])