ONNX conversion

by Berrisius - opened May 5

May 5

Trying to convert the parakeet-tdt-0.6b-v2 model to ONNX format for deployment, but I'm unsure how to proceed with the export. Has anyone successfully converted this model or can share guidance on the correct steps?

nithinraok

NVIDIA org May 5

https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/core/export.html

hammeiam

about 1 month ago

Would also love to see TransformersJS support for this model @Xenova 🙏

Xenova

about 1 month ago

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.

Masterx

about 1 month ago

•

edited about 1 month ago

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

@hammeiam Feel free to open a feature request on GitHub. I don't have a lot of bandwidth at the moment, so hopefully a community member is interested in writing the inference code for it.
@Xenova
is there an example to follow to write the ONNX inference code? especially if it's a streaming implementation.

Nguyen667201

about 1 month ago

Thanks @nithinraok , I got it!
Currently, ONNX export only supports converting the encoder-decoder part of the model. How can I export the full model to ONNX, including the additional preprocessor and decoding?

nithinraok

NVIDIA org about 1 month ago

https://github.com/NVIDIA/NeMo/blob/main/examples/asr/export/transducer/infer_transducer_onnx.py

pscar

27 days ago

•

edited 27 days ago

Thanks @nithinraok - exporting was as simple as following that guide. I've uploaded the converted model to https://huggingface.co/onnx-community/parakeet-tdt-0.6b-v2-ONNX.

Hi @Xenova . Sorry to bother you, but I'm getting error RuntimeError: narrowing_error when exporting to ONNX on a T4. Would you know the solution? Here is my code:

asr_model = nemo_asr.models.ASRModel.restore_from("./parakeet-tdt-0.6b-v2.nemo")
assert isinstance(asr_model, EncDecRNNTBPEModel)
asr_model.freeze()
asr_model.to("cuda")

asr_model.export("./parakeet.onnx")

Thanks in advance!

istupakov

26 days ago

Maybe someone will be interested - I recently made a Python package onnx-asr for ASR inference via ONNX with minimal dependencies (no pytorch, nemo or transformers). And it has support for parakeet-tdt-0.6b-v2.

ldenoue

24 days ago

@istupakov would it work on WebGPU to run in the browser?

istupakov

24 days ago

My package - onnx-asr is written in Python, so it won't work in the browser.

As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.

Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.

marszuo

23 days ago

•

edited 23 days ago

My package - onnx-asr is written in Python, so it won't work in the browser.

As for running this model in a browser in principle, in addition to an encoder and decoder, a preprocessor and decoding code are required. The encoder and decoder are saved when exporting the model to onnx in Nemo, the preprocessor in onnx can be taken from my library. But you will have to write the decoding code in js yourself.

Alternatively, you can save this model not with a TDT decoder, but with CTC (as far as I understand, the model supports this) - in this case, the decoding code is quite trivial.

I wrote a audio to text demo in C++ based on onnx-asr, which can load the parakeet-tdt-0.6b-v2-onnx model for ASR inference. The results are quite good, and I hope you all like it.

nithinraok

NVIDIA org 23 days ago

Amazing work @marszuo @istupakov

nithinraok

NVIDIA org 23 days ago

@istupakov regarding your comment on CTC. Current model only supports TDT its not a hybrid model. We published hybrid models before but not this time.

NullSense

17 days ago

https://github.com/NullSense/Parrator/

I made this simple tool, able to run a daemon and quickly interact with a shortcut to start/stop recording, auto pasting supported, configurable.

Perhaps some of you like this. Quite amazed at the speed of parakeet.

Daemon: Transcription Stats - Chars: 256, Words: 47
Daemon: Attempting to auto-paste from clipboard in 0.5 seconds...
         Ensure a text field is active and focused!
Daemon: Paste simulated.

--- Performance Summary (Daemon Mode) ---
Total time (rec start to paste end): 15.092s
  Recording duration:                  14.136s
  VAD processing duration:             0.002s
  Audio processing after VAD:        0.003s
  ASR Transcription duration:          0.334s
  Clipboard & Paste duration:        0.583s
----------------------------------------

Some dirty benchmarking. Running on RX 6750XT Win11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment