I created an API server wrapper with web UI for Chatterbox TTS

#16
by devnen - opened

The Chatterbox model is truly impressive work by the Resemble AI team. The quality and capabilities are outstanding.

Seeing discussions about running Chatterbox locally, I wanted to share a project I built that might make it easier to get started: https://github.com/devnen/Chatterbox-TTS-Server

It's an enhanced FastAPI server that wraps the Chatterbox model with several useful features. Setup is straightforward with a standard pip install that works on Windows or Linux:

The goal was to create a simple way to run and experiment with Chatterbox without needing to piece together setup yourself, while adding helpful features like chunking for long texts and voice consistency controls.

The server automatically downloads the model from Hugging Face Hub and features a modern web UI for parameter tuning and voice management, plus automatic text chunking for long documents. Includes predefined voices and voice cloning with reference files, plus seed control for consistent results. Offers OpenAI-compatible and custom APIs, GPU/CPU support, and Docker deployment.

This builds on the architecture from my previous Dia-TTS-Server project but is specifically designed for the Chatterbox engine's capabilities:
https://github.com/devnen/Dia-TTS-Server

Hope you find it useful!

I usually see Gradios in this sort of space (AI/TTS Model->dedicated GUI) -- it appears the whole thing is done in JS on the frontend? That's the dream team for me, Python for its enjoyability and JS for its mature perfection (as far as display goes.) So this wasn't NiceGUI, you just routed it up and freeballed?
And yes so far, it is the best model. I've used Resemble's other TTS tools (XTTS is often paired with their other major tool, resemble enhance, put together by default in Daswer's XTTS GUI, which I contribute to as an app creator myself, as a hobbiest anyway) I'm hoping that they will stick with this one for a while and gain traction because it would be a perfect TTS for something like LMStudios.

Thank you for your kind words. I went with vanilla JS instead of Gradio for better control over the UI and chunking workflow. Just FastAPI backend + HTML/JS frontend which keeps it simple and lightweight.

I believe this is the most useful local TTS model since Kokoro. Chatterbox being open-source at this quality level could make a real difference.

@Devnen You are the GOAT This is amazing. You take stuff that's already good and make it better. Shout out to Resemble AI for making stuff like this possible for people to play with and use. I will be using this on my website articles from now on. Good bye Kokoro for now .

Does it support streaming for inference?

@Devnen You are the GOAT This is amazing. You take stuff that's already good and make it better. Shout out to Resemble AI for making stuff like this possible for people to play with and use. I will be using this on my website articles from now on. Good bye Kokoro for now .

Thank you so much. Really appreciate the kind words. Hope it works well for your website articles. I would love to hear how it performs for that use case.

Does it support streaming for inference?

Not currently - it generates the full audio before returning it.

You can work around this by processing smaller text chunks and pulling audio for each chunk from the /tts endpoint, then playing them continuously. That gets you pretty close to a streaming experience.

@Devnen You are the GOAT This is amazing. You take stuff that's already good and make it better. Shout out to Resemble AI for making stuff like this possible for people to play with and use. I will be using this on my website articles from now on. Good bye Kokoro for now .

Thank you so much. Really appreciate the kind words. Hope it works well for your website articles. I would love to hear how it performs for that use case.

I used your version on this article here: https://aibrainworx.com/medgemma-unveiled-googles-open-source-ai-for-medical-innovation . I also updated the Chatterbox article and audio to include a link to your Github repo: https://aibrainworx.com/chatterbox-open-source-expressive-tts - Until I can learn what you do...(I'm trying). It's the least I can do. Thanks again. Let me know what you think.

Sign up or log in to comment