Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#9
by
smshr
- opened
I used the following code to load the model:
`import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
device = torch.device('cuda')
model_path = 'openlm-research/open_llama_3b'
#model_path = 'openlm-research/open_llama_7b'
#model_path = 'openlm-research/open_llama_13b'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float16, device_map='auto'
)
`
but when generating output it gives the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Any leads on how to solve it
device = torch.device('cuda')
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
This moves input onto GPU.