get this error, running on mac book air m2
Traceback (most recent call last): File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/GPTQ_loader.py”, line 17, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/models.py”, line 74, in load_model output = load_func_maploader File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/models.py”, line 278, in GPTQ_loader import modules.GPTQ_loader File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/GPTQ_loader.py”, line 21, in sys.exit(-1) SystemExit: -1
GPTQ models aren't properly supported on macOS. The one-click-installer won't install any GPTQ library, which is why you're getting this error. You could install it manually but there's no GPU acceleration so it will be really slow.
On macOS please use GGML instead. To get GPU acceleration you'll need to manually compile llama-cpp-python with Metal support https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal
Or much easier is to use LM Studio instead, which has full GPU acceleration on macOS and supports all GGML models: https://lmstudio.ai/