Multiple zeroGPU calls in same code

#155
by hen - opened

I have a space that uses two models for a RAG (Embedding model and LLM). I encapsulated my retreive() function (using Embedding model) and the llm_inference() function (using LLM) BOTH with a @spaces.GPU() decorator. Nothing prevents me from doing so, but it seems the space is pretty slow, seems to constantly cold-start between the two models. Any explanation on how @spaces.GPU() works for multiple invocations in the same code is highly appreciated. ! Thanks !

ZeroGPU Explorers org

As I understand the documentation, that's the expected syntax. Sometimes Zero is just a bit slow because it's public and has a lot of traffic.

See if your issue persists regardless of time of the day.

Sign up or log in to comment