VRAM usage

#1
by SerialKicked - opened

I don't use Gemma 27B models often, but for some reason all the GGUF i try seem to consume a LOT more VRAM than even Qwen3 32B for the same context size and quantization. Is there something funky going on with Gemma3 GGUF files? I tried with both KoboldCpp & LM Studio, made no difference.

Sign up or log in to comment