Incomplete GPU utilisation

by UsernamePartialName - opened Jun 11

Jun 11

•

Running with recommended parameters (sans -ngl 99) on llama.cpp results in 85% GPU saturation tops. While Qwen3, Gemma3 and pretty much any other model use >= 97%

Is there something special about its architecture that might prevent full GPU utilisation? I am using ROCm backend.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment