Incomplete GPU utilisation

#3
by UsernamePartialName - opened

Running with recommended parameters (sans -ngl 99) on llama.cpp results in 85% GPU saturation tops. While Qwen3, Gemma3 and pretty much any other model use >= 97%

Is there something special about its architecture that might prevent full GPU utilisation? I am using ROCm backend.

Sign up or log in to comment