ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large · Instructions for streaming audio

viju2008

15 days ago

Dear Sir,

Please provide the sample code for streaming audio speech recognition

6 days ago

It is very difficult to convert it to a streaming model. However we can follow a client server approach. https://github.com/deepanshu-yadav/voice-form-filler Here I converted some model trained in English to a server and sent small audio packets to it. Finally the server returns the desired response. However there is a catch
The model is first converted to onnx then quantized to 8 bit integers. This gives faster performance.

So we need to

Convert this .nemo into onnx runtime.
Quantize this to 8 bit integer model.
Make a server to server that serves this model.
Make client recipes that calls this model.

I am already on this task. Do you want to join me ?

pronoobie

4 days ago

•

edited 4 days ago

An update to my proposal
Here is another way can perform inference using this model. I have converted into a quantized version of this model but this is in batched mode.
https://huggingface.co/pronoobie/indic_conformer_hi_float16_onnx_256_vocab

Now I am doing for streaming mode.
Batched was easy. But in streaming mode you have to maintain the intermediatory state in every chunk, think about overlap, think about voice activity detection.

Let me if you want to collaborate.