mrfakename/E2-F5-TTS · How to optimize the model if it is running locally

Hi, how are you? First of all, I find the project you created very interesting, and I want to congratulate you. Let me explain.

I downloaded f5-tts locally to implement it in an API using FastAPI.
While observing the stdout when executing a command to clone an audio, I noticed that the model downloads components from Hugging Face on every execution. It also loads the model every time, and it even seems like there's an audio conversion step involved.

When checking the GPU usage graph, there's a noticeable spike that lasts no more than 2 seconds. However, the entire execution takes around 19 seconds in total.

Is there a way to preload the model so it loads only once? Or is there any other way to optimize this process?
Your response would be greatly appreciated.