inference example
#2
by
rrkotik
- opened
Hello, can you provide how to run inference for it?
i tried something like this:
model = transformers.LlamaForCausalLM.from_pretrained("Neko-Institute-of-Science/LLaMA-7B-4bit-128g", load_in_8bit=True, device_map='auto')
and I received error:
ValueError: weight is on the meta device, we need a `value` to put in on 0.
You will need GPTQ for llama to run this.