inference example

by rrkotik - opened May 3, 2023

May 3, 2023

Hello, can you provide how to run inference for it?

i tried something like this:

model = transformers.LlamaForCausalLM.from_pretrained("kuleshov/llama-7b-4bit", load_in_8bit=True, device_map='auto')

I receive error:

ValueError: weight is on the meta device, we need a `value` to put in on 0.

Jun 8, 2023

The lines of code run well in the env, But I am confused that its gpu memory usage is about to 8GB, which is the same with llama-7b-int8 model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment