Low GPU Utilization during inference?
Running on WSl-Ubuntu:
GPU Utilization never reaches 60%, usually within the 45-55% range.
I have tried this on the 4b-it and the 12b-it and noticed the same.
Both with just plain text prompt, as well as image+text prompt.
I have reproduced this with the default sample script shown in the model card, as well as various other ways of trying to load/run the model.
I am unsure if this is expected behaviour or not.
If someone can assist me with this, I would greatly appreciate it.
Thank you.
Hi @BagelBig ,
Apologies for the late reply, Welcome to Gemma family of Google open-source models. It's a common observation that GPU utilization for Large Language Models (LLMs) during inference, especially smaller models like Gemma 4B-it and 12B-it, might not consistently reach very high percentages like other large parameter models like 27b
Gemma models. If you are using any of the powerful hardware GPU setup it's quite normal to see these models consume very less percentage of it.
If you required any further assistance with respect to Gemma models please feel free to reach out to me.
Thanks.