AbirMessaoudi
commited on
Update Model Card
Browse files
README.md
CHANGED
@@ -5,4 +5,99 @@ language:
|
|
5 |
- es
|
6 |
base_model:
|
7 |
- nvidia/stt_es_conformer_transducer_large
|
8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- es
|
6 |
base_model:
|
7 |
- nvidia/stt_es_conformer_transducer_large
|
8 |
+
---
|
9 |
+
|
10 |
+
---
|
11 |
+
# NVIDIA Conformer-Transducer Large (ca-es)
|
12 |
+
|
13 |
+
## Table of Contents
|
14 |
+
<details>
|
15 |
+
<summary>Click to expand</summary>
|
16 |
+
|
17 |
+
- [Model Description](#model-description)
|
18 |
+
- [Intended Uses and Limitations](#intended-uses-and-limitations)
|
19 |
+
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
|
20 |
+
- [Training Details](#training-details)
|
21 |
+
- [Citation](#citation)
|
22 |
+
- [Additional Information](#additional-information)
|
23 |
+
|
24 |
+
</details>
|
25 |
+
|
26 |
+
## Summary
|
27 |
+
|
28 |
+
The "stt_ca-es_conformer_transducer_large" is an acoustic model based on ["NVIDIA/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large/) suitable for Bilingual Catalan-Spanisg Automatic Speech Recognition.
|
29 |
+
|
30 |
+
## Model Description
|
31 |
+
|
32 |
+
This model transcribes speech in lowercase Catalan and Spanish alphabet including spaces, and was Fine-tuned on a Bilingual ca-es dataset comprising of xx hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters.
|
33 |
+
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
|
34 |
+
|
35 |
+
## Intended Uses and Limitations
|
36 |
+
|
37 |
+
This model can used for Automatic Speech Recognition (ASR) in Catalan and Spanish. The model is intended to transcribe audio files in Catalan and Spanish to plain text without punctuation.
|
38 |
+
|
39 |
+
## How to Get Started with the Model
|
40 |
+
|
41 |
+
To see an updated and functional version of this code, please check our [Notebook](insert notebook link)
|
42 |
+
|
43 |
+
### Installation
|
44 |
+
|
45 |
+
To use this model, Install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed the latest Pytorch version.
|
46 |
+
```
|
47 |
+
pip install nemo_toolkit['all']
|
48 |
+
```
|
49 |
+
|
50 |
+
### For Inference
|
51 |
+
To transcribe audio in Catalan and Spanish using this model, you can follow this example:
|
52 |
+
|
53 |
+
|
54 |
+
```python
|
55 |
+
import nemo.collections.asr as nemo_asr
|
56 |
+
|
57 |
+
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
|
58 |
+
transcription = nemo_asr_model.transcribe([audio_path])[0][0]
|
59 |
+
print(transcription)
|
60 |
+
```
|
61 |
+
|
62 |
+
## Training Details
|
63 |
+
|
64 |
+
### Training data
|
65 |
+
|
66 |
+
The model was trained on bilingual datasets in Catalan and Spanish. The total number of hours is xx.
|
67 |
+
|
68 |
+
### Training procedure
|
69 |
+
|
70 |
+
This model is the result of finetuning the base model ["Nvidia/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large) by following this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Transducers_with_HF_Datasets.ipynb).
|
71 |
+
|
72 |
+
## Citation
|
73 |
+
If this model contributes to your research, please cite the work:
|
74 |
+
```bibtex
|
75 |
+
@misc{mena2024whisperlarge3catparla,
|
76 |
+
title={Bilingual ca-es ASR Model: stt_ca-es_conformer_transducer_large.},
|
77 |
+
author={Messaoudi, Abir; Külebi, Baybars},
|
78 |
+
organization={Barcelona Supercomputing Center},
|
79 |
+
url={https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large},
|
80 |
+
year={2024}
|
81 |
+
}
|
82 |
+
```
|
83 |
+
|
84 |
+
## Additional Information
|
85 |
+
|
86 |
+
### Author
|
87 |
+
|
88 |
+
The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
|
89 |
+
|
90 |
+
### Contact
|
91 |
+
For further information, please send an email to <[email protected]>.
|
92 |
+
|
93 |
+
### Copyright
|
94 |
+
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
|
95 |
+
|
96 |
+
### License
|
97 |
+
|
98 |
+
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
|
99 |
+
|
100 |
+
### Funding
|
101 |
+
This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
|
102 |
+
|
103 |
+
The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.
|