inference example
#4
by
rrkotik
- opened
Hello, can you provide how to run inference for it?
i tried something like this:
model = transformers.LlamaForCausalLM.from_pretrained("kuleshov/llama-7b-4bit", load_in_8bit=True, device_map='auto')
I receive error:
ValueError: weight is on the meta device, we need a `value` to put in on 0.
The lines of code run well in the env, But I am confused that its gpu memory usage is about to 8GB, which is the same with llama-7b-int8 model.