projecte-aina
/

stt_ca-es_conformer_transducer_large

 - es
 base_model:
 - nvidia/stt_es_conformer_transducer_large
+---
+---
+# NVIDIA Conformer-Transducer Large (ca-es)
+## Table of Contents
+<details>
+<summary>Click to expand</summary>
+- [Model Description](#model-description)
+- [Intended Uses and Limitations](#intended-uses-and-limitations)
+- [How to Get Started with the Model](#how-to-get-started-with-the-model)
+- [Training Details](#training-details)
+- [Citation](#citation)
+- [Additional Information](#additional-information)
+</details>
+## Summary
+The "stt_ca-es_conformer_transducer_large" is an acoustic model based on ["NVIDIA/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large/) suitable for Bilingual Catalan-Spanisg Automatic Speech Recognition.
+## Model Description
+This model transcribes speech in lowercase Catalan and Spanish alphabet including spaces, and was Fine-tuned on a Bilingual ca-es dataset comprising of xx hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters.
+See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
+## Intended Uses and Limitations
+This model can used for Automatic Speech Recognition (ASR) in Catalan and Spanish. The model is intended to transcribe audio files in Catalan and Spanish to plain text without punctuation.
+## How to Get Started with the Model
+To see an updated and functional version of this code, please check our [Notebook](insert notebook link)
+### Installation
+To use this model, Install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed the latest Pytorch version.
+```
+pip install nemo_toolkit['all']
+```
+### For Inference
+To transcribe audio in Catalan and Spanish using this model, you can follow this example:
+```python
+import nemo.collections.asr as nemo_asr
+nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
+transcription = nemo_asr_model.transcribe([audio_path])[0][0]
+print(transcription)
+```
+## Training Details
+### Training data
+The model was trained on bilingual datasets in Catalan and Spanish. The total number of hours is xx.
+### Training procedure
+This model is the result of finetuning the base model ["Nvidia/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large) by following this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Transducers_with_HF_Datasets.ipynb).
+## Citation
+If this model contributes to your research, please cite the work:
+```bibtex
+@misc{mena2024whisperlarge3catparla,
+      title={Bilingual ca-es ASR Model: stt_ca-es_conformer_transducer_large.},
+      author={Messaoudi, Abir; Külebi, Baybars},
+      organization={Barcelona Supercomputing Center},
+      url={https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large},
+      year={2024}
+}
+```
+## Additional Information
+### Author
+The fine-tuning process was performed during 2024 in the [Language Technologies Unit](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Abir Messaoudi](https://huggingface.co/AbirMessaoudi).
+### Contact
+For further information, please send an email to <[email protected]>.
+### Copyright
+Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
+### License
+[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
+### Funding
+This work has been promoted and financed by the Generalitat de Catalunya through the [Aina project](https://projecteaina.cat/).
+The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.