Incomplete GPU utilisation
#3
by
UsernamePartialName
- opened
Running with recommended parameters (sans -ngl 99
) on llama.cpp
results in 85% GPU saturation tops. While Qwen3, Gemma3 and pretty much any other model use >= 97%
Is there something special about its architecture that might prevent full GPU utilisation? I am using ROCm backend.