parakeet-rnnt-1.1b_cv17_es_ep18_1270h
Table of Contents
Click to expand
Summary
The "parakeet-rnnt-1.1b_cv17_es_ep18_1270h" is an acoustic model based on "nvidia/parakeet-rnnt-1.1b" suitable for Automatic Speech Recognition in Spanish.
Model Description
The "parakeet-rnnt-1.1b_cv17_es_ep18_1270h" is an acoustic model suitable for Automatic Speech Recognition in Spanish. It is the result of finetuning the model "nvidia/parakeet-rnnt-1.1b" with 1270 hours of Spanish data from Mozilla Common Voice 17.0
Intended Uses and Limitations
This model can be used for Automatic Speech Recognition (ASR) in Spanish. The model is intended to transcribe audio files in Spanish to plain text without punctuation.
How to Get Started with the Model
To see an updated and functional version of this code, please the NVIDIA's official repository
Installation
In order to use this model, you may install the NVIDIA NeMo Framework:
Create a virtual environment:
python -m venv /path/to/venv
Activate the environment:
source /path/to/venv/bin/activate
Install the modules:
BRANCH = 'main'
python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]
For Inference
In order to transcribe audio in Spanish using this model, you can follow this example:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h")
output = asr_model.transcribe(['YOUR_WAV_FILE.wav'])
print(output[0].text)
Training Details
Training data
The specific dataset used to create the model is called "cv17_es_other_automatically_verified".
Training procedure
This model is the result of finetuning the model "parakeet-rnnt-1.1b" by following this tutorial
Training Hyperparameters
- language: spanish
- hours of training audio: 1270
- learning rate: 2e-4
- devices=4
- num_nodes=8
- accelerator=accelerator
- strategy="ddp"
- max_epochs=50
- enable_checkpointing=True
- logger=False
- log_every_n_steps=100
- check_val_every_n_epoch=1
- precision='bf16-mixed'
- callbacks=[checkpoint_callback]
Citation
If this model contributes to your research, please cite the work:
@misc{mena2024parakeetspanish,
title={Acoustic Model in Spanish: parakeet-rnnt-1.1b_cv17_es_ep18_1270h.},
author={Hernandez Mena, Carlos Daniel},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h},
year={2024}
}
Additional Information
Author
The fine-tuning process was perform during November (2024) in the Language Technologies Unit of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena.
Contact
For further information, please send an email to [email protected].
Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
License
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.
- Downloads last month
- 0
Dataset used to train projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h
Collection including projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h
Evaluation results
- WER on Common Voice 17.0 Spanish (Test)test set self-reported3.930
- WER on Common Voice 17.0 Spanish (Dev)self-reported3.550