ModuleNotFoundError: No module named ‘llama_inference_offload’ on Mac m1 chip
Model trying to load: guanaco-65B.ggmlv3.q4_0.bin
Machine: Mac m1 max, 64GB RAM
Error in WebUI:
Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 18, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 97, in load_model output = load_func(model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 289, in GPTQ_loader import modules.GPTQ_loader File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 22, in sys.exit(-1) SystemExit: -1
Had the same error previously with guanaco-65B-GPTQ, @TheBloke suggested me to use the GGML version to work on Mac, Even if I select GGML version, for some reason it is still loading the GPTQ version. Am i missing anything?
I think i'm running out of memory too.
Error in command prompt:
INFO:Loading TheBloke_guanaco-65B-GGML...
INFO:llama.cpp weights detected: models/TheBloke_guanaco-65B-GGML/guanaco-65B.ggmlv3.q4_1.bin
INFO:Cache capacity is 0 bytes
llama.cpp: loading model from models/TheBloke_guanaco-65B-GGML/guanaco-65B.ggmlv3.q4_1.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 8192
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 64
llama_model_load_internal: n_layer = 80
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 22016
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size = 0.00 MB
error loading model: llama.cpp: tensor 'layers.1.ffn_norm.weight' is missing from model
llama_init_from_file: failed to load model
Exception ignored in: <function LlamaCppModel.__del__ at 0x16a6b7520>
Traceback (most recent call last):
File "/Users/vij/development/text-generation-webui/modules/llamacpp_model.py", line 23, in del
self.model.del()
AttributeError: 'LlamaCppModel' object has no attribute 'model'
INFO:Loading TheBloke_guanaco-65B-GGML...
ERROR:Failed to load GPTQ-for-LLaMa
ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md
Yeah you can't use GPTQ on macOS.
I imagine you have the parameters set wrong. You don't set GPTQ parameters for GGML models. Leave all the GPTQ parameters at "None". Then text-gen-ui will load this model as a GGML model on CPU.