Instructions for streaming audio

#2
by viju2008 - opened

Dear Sir,

Please provide the sample code for streaming audio speech recognition

It is very difficult to convert it to a streaming model. However we can follow a client server approach. https://github.com/deepanshu-yadav/voice-form-filler Here I converted some model trained in English to a server and sent small audio packets to it. Finally the server returns the desired response. However there is a catch
The model is first converted to onnx then quantized to 8 bit integers. This gives faster performance.

So we need to

  1. Convert this .nemo into onnx runtime.
  2. Quantize this to 8 bit integer model.
  3. Make a server to server that serves this model.
  4. Make client recipes that calls this model.

I am already on this task. Do you want to join me ?

An update to my proposal
Here is another way can perform inference using this model. I have converted into a quantized version of this model but this is in batched mode.
https://huggingface.co/pronoobie/indic_conformer_hi_float16_onnx_256_vocab

Now I am doing for streaming mode.
Batched was easy. But in streaming mode you have to maintain the intermediatory state in every chunk, think about overlap, think about voice activity detection.

Let me if you want to collaborate.

Sign up or log in to comment