Which quantized version can run on a Mac computer with 32GB of memory?

by jimpunk - opened Apr 11

Discussion

jimpunk

Apr 11

I tried the smallest Q1S version, but it threw an error.

shimmyshimmer

Unsloth AI org Apr 11

I tried the smallest Q1S version, but it threw an error.

what were you using? llama.cpp?

jimpunk

Apr 11

I tried the smallest Q1S version, but it threw an error.

what were you using? llama.cpp?

I'm using ollama.

rdsm

Apr 11

•

edited Apr 11

The smallest one is 33GB I don't think you can fit any of them in a 32GB mac.

Maybe using swap, but that would be not so very nice to your disk.

TobDeBer

Apr 11

I know that llama.cpp can use mmap and I was able to handle a 671B R1 on just 32GB RAM. It was reading from disk at full speed.
In your case you might be lucky that some experts are rarely used and it mostly stays in RAM even though the model is slighty larger than RAM. I'd say go for it and try it!
PS: I have no idea what can be configured in ollama. It's more user friendly but also more restricted in its options. And it's based on llama.cpp under the hood :-D

bryan0101

16 days ago

I have a intel 12th gen 32gb w 3060 VRAM 6gb. Is there a way to use it to run/test the Q1S with Ollama? I tried lmstudio, guardrails stable even at relaxed setting won't budge, when no restrict system freezes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment