google
/

gemma-3-4b-it

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Low GPU Utilization during inference?

#39

by BagelBig - opened Apr 8

Apr 8

Running on WSl-Ubuntu:
GPU Utilization never reaches 60%, usually within the 45-55% range.
I have tried this on the 4b-it and the 12b-it and noticed the same.
Both with just plain text prompt, as well as image+text prompt.
I have reproduced this with the default sample script shown in the model card, as well as various other ways of trying to load/run the model.

I am unsure if this is expected behaviour or not.

If someone can assist me with this, I would greatly appreciate it.

Thank you.

Google org Jul 16

Apologies for the late reply, Welcome to Gemma family of Google open-source models. It's a common observation that GPU utilization for Large Language Models (LLMs) during inference, especially smaller models like Gemma 4B-it and 12B-it, might not consistently reach very high percentages like other large parameter models like 27b Gemma models. If you are using any of the powerful hardware GPU setup it's quite normal to see these models consume very less percentage of it.

If you required any further assistance with respect to Gemma models please feel free to reach out to me.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment