VRAM usage

by SerialKicked - opened 18 days ago

18 days ago

•

I don't use Gemma 27B models often, but for some reason all the GGUF i try seem to consume a LOT more VRAM than even Qwen3 32B for the same context size and quantization. Is there something funky going on with Gemma3 GGUF files? I tried with both KoboldCpp & LM Studio, made no difference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment