ONNX conversion
Trying to convert the parakeet-tdt-0.6b-v2 model to ONNX format for deployment, but I'm unsure how to proceed with the export. Has anyone successfully converted this model or can share guidance on the correct steps?
Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.
@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.
Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.
@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.
@Xenova
is there an example to follow to write the ONNX inference code? especially if it's a streaming implementation.
Thanks
@nithinraok
, I got it!
Currently, ONNX export only supports converting the encoder-decoder part of the model. How can I export the full model to ONNX, including the additional preprocessor and decoding?
Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.
Hi
@Xenova
. Sorry to bother you, but I'm getting error RuntimeError: narrowing_error
when exporting to ONNX on a T4. Would you know the solution? Here is my code:
asr_model = nemo_asr.models.ASRModel.restore_from("./parakeet-tdt-0.6b-v2.nemo")
assert isinstance(asr_model, EncDecRNNTBPEModel)
asr_model.freeze()
asr_model.to("cuda")
asr_model.export("./parakeet.onnx")
Thanks in advance!
Maybe someone will be interested - I recently made a Python package onnx-asr for ASR inference via ONNX with minimal dependencies (no pytorch, nemo or transformers). And it has support for parakeet-tdt-0.6b-v2.
My package - onnx-asr is written in Python, so it won't work in the browser.
As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.
Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.
My package - onnx-asr is written in Python, so it won't work in the browser.
As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.
Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.
I wrote a audio to text demo in C++ based on onnx-asr, which can load the parakeet-tdt-0.6b-v2-onnx model for ASR inference. The results are quite good, and I hope you all like it.
@istupakov regarding your comment on CTC. Current model only supports TDT its not a hybrid model. We published hybrid models before but not this time.
https://github.com/NullSense/Parrator/
I made this simple tool, able to run a daemon and quickly interact with a shortcut to start/stop recording, auto pasting supported, configurable.
Perhaps some of you like this. Quite amazed at the speed of parakeet.
Daemon: Transcription Stats - Chars: 256, Words: 47
Daemon: Attempting to auto-paste from clipboard in 0.5 seconds...
Ensure a text field is active and focused!
Daemon: Paste simulated.
--- Performance Summary (Daemon Mode) ---
Total time (rec start to paste end): 15.092s
Recording duration: 14.136s
VAD processing duration: 0.002s
Audio processing after VAD: 0.003s
ASR Transcription duration: 0.334s
Clipboard & Paste duration: 0.583s
----------------------------------------
Some dirty benchmarking. Running on RX 6750XT Win11