parakeet-rnnt-1.1b_cv17_es_ep18_1270h

Table of Contents

Click to expand

Summary

The "parakeet-rnnt-1.1b_cv17_es_ep18_1270h" is an acoustic model based on "nvidia/parakeet-rnnt-1.1b" suitable for Automatic Speech Recognition in Spanish.

Model Description

The "parakeet-rnnt-1.1b_cv17_es_ep18_1270h" is an acoustic model suitable for Automatic Speech Recognition in Spanish. It is the result of finetuning the model "nvidia/parakeet-rnnt-1.1b" with 1270 hours of Spanish data from Mozilla Common Voice 17.0

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Spanish. The model is intended to transcribe audio files in Spanish to plain text without punctuation.

How to Get Started with the Model

To see an updated and functional version of this code, please the NVIDIA's official repository

Installation

In order to use this model, you may install the NVIDIA NeMo Framework:

Create a virtual environment:

python -m venv /path/to/venv

Activate the environment:

source /path/to/venv/bin/activate

Install the modules:

BRANCH = 'main'
python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]

For Inference

In order to transcribe audio in Spanish using this model, you can follow this example:

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h")

output = asr_model.transcribe(['YOUR_WAV_FILE.wav'])
print(output[0].text)

Training Details

Training data

The specific dataset used to create the model is called "cv17_es_other_automatically_verified".

Training procedure

This model is the result of finetuning the model "parakeet-rnnt-1.1b" by following this tutorial

Training Hyperparameters

  • language: spanish
  • hours of training audio: 1270
  • learning rate: 2e-4
  • devices=4
  • num_nodes=8
  • accelerator=accelerator
  • strategy="ddp"
  • max_epochs=50
  • enable_checkpointing=True
  • logger=False
  • log_every_n_steps=100
  • check_val_every_n_epoch=1
  • precision='bf16-mixed'
  • callbacks=[checkpoint_callback]

Citation

If this model contributes to your research, please cite the work:

@misc{mena2024parakeetspanish,
      title={Acoustic Model in Spanish: parakeet-rnnt-1.1b_cv17_es_ep18_1270h.}, 
      author={Hernandez Mena, Carlos Daniel},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h},
      year={2024}
}

Additional Information

Author

The fine-tuning process was perform during November (2024) in the Language Technologies Unit of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.

Contact

For further information, please send an email to [email protected].

Copyright

Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.

License

Apache-2.0

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h

Collection including projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h

Evaluation results