AbirMessaoudi commited on
Commit
4465cb0
·
verified ·
1 Parent(s): f15e951

Update Model Card

Browse files
Files changed (1) hide show
  1. README.md +96 -1
README.md CHANGED
@@ -5,4 +5,99 @@ language:
5
  - es
6
  base_model:
7
  - nvidia/stt_es_conformer_transducer_large
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - es
6
  base_model:
7
  - nvidia/stt_es_conformer_transducer_large
8
+ ---
9
+
10
+ ---
11
+ # NVIDIA Conformer-Transducer Large (ca-es)
12
+
13
+ ## Table of Contents
14
+ <details>
15
+ <summary>Click to expand</summary>
16
+
17
+ - [Model Description](#model-description)
18
+ - [Intended Uses and Limitations](#intended-uses-and-limitations)
19
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
20
+ - [Training Details](#training-details)
21
+ - [Citation](#citation)
22
+ - [Additional Information](#additional-information)
23
+
24
+ </details>
25
+
26
+ ## Summary
27
+
28
+ The "stt_ca-es_conformer_transducer_large" is an acoustic model based on ["NVIDIA/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large/) suitable for Bilingual Catalan-Spanisg Automatic Speech Recognition.
29
+
30
+ ## Model Description
31
+
32
+ This model transcribes speech in lowercase Catalan and Spanish alphabet including spaces, and was Fine-tuned on a Bilingual ca-es dataset comprising of xx hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters.
33
+ See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
34
+
35
+ ## Intended Uses and Limitations
36
+
37
+ This model can used for Automatic Speech Recognition (ASR) in Catalan and Spanish. The model is intended to transcribe audio files in Catalan and Spanish to plain text without punctuation.
38
+
39
+ ## How to Get Started with the Model
40
+
41
+ To see an updated and functional version of this code, please check our [Notebook](insert notebook link)
42
+
43
+ ### Installation
44
+
45
+ To use this model, Install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed the latest Pytorch version.
46
+ ```
47
+ pip install nemo_toolkit['all']
48
+ ```
49
+
50
+ ### For Inference
51
+ To transcribe audio in Catalan and Spanish using this model, you can follow this example:
52
+
53
+
54
+ ```python
55
+ import nemo.collections.asr as nemo_asr
56
+
57
+ nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
58
+ transcription = nemo_asr_model.transcribe([audio_path])[0][0]
59
+ print(transcription)
60
+ ```
61
+
62
+ ## Training Details
63
+
64
+ ### Training data
65
+
66
+ The model was trained on bilingual datasets in Catalan and Spanish. The total number of hours is xx.
67
+
68
+ ### Training procedure
69
+
70
+ This model is the result of finetuning the base model ["Nvidia/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large) by following this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Transducers_with_HF_Datasets.ipynb).
71
+
72
+ ## Citation
73
+ If this model contributes to your research, please cite the work:
74
+ ```bibtex
75
+ @misc{mena2024whisperlarge3catparla,
76
+ title={Bilingual ca-es ASR Model: stt_ca-es_conformer_transducer_large.},
77
+ author={Messaoudi, Abir; Külebi, Baybars},
78
+ organization={Barcelona Supercomputing Center},
79
+ url={https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large},
80
+ year={2024}
81
+ }
82
+ ```
83
+
84
+ ## Additional Information
85
+
86
+ ### Author
87
+
88
+ The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
89
+
90
+ ### Contact
91
+ For further information, please send an email to <[email protected]>.
92
+
93
+ ### Copyright
94
+ Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
95
+
96
+ ### License
97
+
98
+ [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
99
+
100
+ ### Funding
101
+ This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
102
+
103
+ The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.