Real-Time Mic Transcription on free 2vCPU - using this model, check it out

#41

by WJ88 - opened 13 days ago

Discussion

WJ88

13 days ago

•

edited 13 days ago

Hi,

https://youtube.com/shorts/YV-waEbg1S8

I am working on Real-Time Mic version, I have working one ready to test:

https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription
*the whole point of this space is to fit the model into 2vCPUs :) and it works!

**THERE IS 4seconds delay but it is transcribed in less than 2 seconds - it will be improved but still, its real-time comparing to other SPACES,
where you have to RECORD first then await for transcription. ** (any FEEDBACK appreciated) //Wojciech

The UI may not be nice but in overall just click RECORD, speak and watch transcription. After you finish, refresh the Browser Tab. (to free resources (please)
NOTE: the app is currently public, I mean, each user transcriptions are accumulating and other users cans see them, I am working on isolation but it is what it is, it works :)

You can use NVIDIA-Parakeet-TDT-0.6B-v2 without NVIDIA card in REAL-TIME - I encourage you to check it, check the code (its interesting that the model fits to 2vCPUs) and finally clone and base on that make your own version! I will stick to optimizations and not fancy features in my repo.

"I love Pain"

WJ88

12 days ago

updated with preprocessing of audio from different plantforms, should work for you if did not before - also added video on how to use the app (a short one, less than one minute) - it is also a DEMO showing that the app works :)

pronoobie

12 days ago

I tried the integer version of this model to fill html forms. If anyone wants to convert this into a okayish server can have a look at this repository
https://github.com/deepanshu-yadav/voice-form-filler

WJ88

12 days ago

I tried the integer version of this model to fill html forms. If anyone wants to convert this into a okayish server can have a look at this repository
https://github.com/deepanshu-yadav/voice-form-filler

Hi,
I am not sure if understood correctly, my space has in the name INT8 (I was running on int8 at first) but at this moment I have reverted to normal, and I am using other optimizations - I am letting you know that in case you use my code directly do not expect int8 - just so you know (I need to add it to the description to not disinform users...)

There was a problem to make it int8 because of deepcopy - i will do this but first I have to work on managin users session ;)
BR//
Wojciech

pronoobie

11 days ago

Hi,
Just to clarify people who are interested in this thread I did not convert this model to integer version myself someone did that here https://github.com/k2-fsa/sherpa-onnx/tree/master/scripts/nemo/parakeet-tdt-0.6b-v2
I just wrapped it around a server for my use case. Done here https://github.com/deepanshu-yadav/voice-form-filler/blob/main/asr_server.py
You are right we need more optimizations in order to use for Realtime more importantly I am trying really hard for fine tuning it to other languages.

Cheers

WJ88

10 days ago

UPDATE:

Clone it to your own private repository Space on HuggingFace and have your own transcriber.
Also, I encourage you to built your own app on top of it, its just a beginning - I am going for assistant who will use transcription real time to help me but for this I need to use GPU so no more updates probably here.
Its a good base for projects, check the link again I have released final version with user session separation

https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment