Spaces:

freddyaboulton
/

gemini-audio-video-chat

Running

Faster image encoding using opencv-python (cv2), UI skips api key page if the key is already provided as an env variable, twilio account is now optional, display twilio and gemini connection status. The reliability of the video is also improved vs the main branch at the time writing, but it can still freeze.

ahundt changed pull request status to open Feb 22

ahundt

Feb 24

Hi @freddyaboulton , I fixed several bugs and improved the UI a bit, e.g. a twilio account is now optional, so I believe these changes are worth consideration for merging.

Two additional items:

There is an outstanding bug that applies to both your original code and these changes. After hitting record, the video tends to stop updating after several seconds on my 2023 MacBook Pro M3Max running up-to-date chrome. The audio works fine for the duration of the session. By contrast, I'm able to run the live streaming with both video and audio on aistudio.google.com using the "stream realtime" option on the same machine without issue.
I'm trying to add the additional text modality that's in aistudio's "stream realtime" API to print that too, (e.g. config = {"response_modalities": ["AUDIO", "TEXT"]}) but I haven't had luck configuring the calls correctly.

Do you have any suggestions for both of these?

Thanks!

freddyaboulton

Owner Feb 24

Hi @ahundt - thank you for the great work on this

There is an outstanding bug that applies to both your original code and these changes

Yes I saw the bug that you opened on my repo as well as aiortc. I have not dug into it yet, hopefully the aiortc maintainers can give some clarity

I'm trying to add the additional text modality that's in aistudio's "stream realtime" API to print that too

I also tried to get the text response and display in a chatbot UI but I couldn't get it to work

e.g. a twilio account is now optional,

Are you sure? Yes, twilio is optional for local development and for deploying on a server that is not behind a firewall. But for spaces/heroku/render, the turn credentials are needed.

Can you also explain the motivation for the changes in this PR? What bugs do they fix? Would help me better understand and improve the core of the library.

ahundt

Feb 26

For the bugs, some of the internal connection state wasn't visible to the end user, so one could not tell why the system wasn't working. The API Key screen isn't needed if the key is provided on the command line. The opencv libraries will be much higher performance and more robust for image conversion.

I suspect the aiortc bug can be triggered by dropping network packets, e.g. by going by the edge of wifi range or using a testing tool that deliberately forces packets to drop.

I've reported the gemini api issue here, an additional comment with your experience would be helpful!
https://github.com/googleapis/python-genai/issues/380

I think there is one other bug I'm forgetting... sorry about that!

freddyaboulton

Owner Feb 26

Hi @ahundt - can you open a PR on github ? It will be synched to this new space. FastRTC is the new name of the gradio_webrtc package.

For the bugs, some of the internal connection state wasn't visible to the end user, so one could not tell why the system wasn't working

Yes sorry about that, if you install the latest version of the package fastrtc==0.0.5.post2, errors should automatically be displayed in the UI.

I welcome all the other contributions though! And it looks like there is a PR open for the aiortc bug you found!!

freddyaboulton changed pull request status to closed Feb 26

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment