Spaces:
Running
on
Zero
How to optimize the model if it is running locally
Hi, how are you? First of all, I find the project you created very interesting, and I want to congratulate you. Let me explain.
I downloaded f5-tts locally to implement it in an API using FastAPI.
While observing the stdout when executing a command to clone an audio, I noticed that the model downloads components from Hugging Face on every execution. It also loads the model every time, and it even seems like there's an audio conversion step involved.
When checking the GPU usage graph, there's a noticeable spike that lasts no more than 2 seconds. However, the entire execution takes around 19 seconds in total.
Is there a way to preload the model so it loads only once? Or is there any other way to optimize this process?
Your response would be greatly appreciated.
Hi,
Thanks for your interest in the demo!
1.) Forking the online demo is probably not the easiest way to build a FastAPI API. I would instead recommend using the Python API:
from f5_tts.api import F5TTS
api = F5TTS()
api.infer(
...
)
This implementation should make the model load only once.
2.) I'm not sure about the GPU spike, however it might be due to the transcription that the model does. The model first transcribes the audio using OpenAI Whisper before generating audio.