Which quantized version can run on a Mac computer with 32GB of memory?
I tried the smallest Q1S version, but it threw an error.
I tried the smallest Q1S version, but it threw an error.
what were you using? llama.cpp?
I tried the smallest Q1S version, but it threw an error.
what were you using? llama.cpp?
I'm using ollama.
The smallest one is 33GB I don't think you can fit any of them in a 32GB mac.
Maybe using swap, but that would be not so very nice to your disk.
I know that llama.cpp can use mmap and I was able to handle a 671B R1 on just 32GB RAM. It was reading from disk at full speed.
In your case you might be lucky that some experts are rarely used and it mostly stays in RAM even though the model is slighty larger than RAM. I'd say go for it and try it!
PS: I have no idea what can be configured in ollama. It's more user friendly but also more restricted in its options. And it's based on llama.cpp under the hood :-D