Getting an error when loading the model
Can you please share what script you used to run the model? With the script provided in this repo, I am getting "Runtime error: LayerNormKernelImpl not implemented for Half". Thank you!
Are you running on CPU ?
Yes, I am using CPU only. I changed the load mode to CPU as well. I have 96GB of RAM.
In that case can you try removing the torch_dtype ? CPUs don't support half-precision (i.e. float16)
Sure. I will share results soon here.
So, I will remove torch_dtype=bfloat16 parameter from the script provided here, right?
Thanks!
Yes exactly, let us know if it works :)
So, I changed the script and removed the bfloat16 parameter. But I am still getting an error.
Below is the code I used to run the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(".", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(".", device_map="cpu", trust_remote_code=True)
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cpu")
outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Above code is saved in dbrx_run.py and it is in the same folder as model weights. And here is the error I get when I run it:
user:~/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit$ python3 dbrx_run.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:02<00:00, 5.42it/s]
Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.
Traceback (most recent call last):
File "/home/user/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit/dbrx_run.py", line 12, in <module>
outputs = model.generate(**input_ids, max_new_tokens=100)
File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/52aabf1d1280c4a1f0425f8bc3554b66c1318007/modeling_dbrx.py", line 1307, in forward
outputs = self.transformer(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/52aabf1d1280c4a1f0425f8bc3554b66c1318007/modeling_dbrx.py", line 1105, in forward
block_outputs = block(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/52aabf1d1280c4a1f0425f8bc3554b66c1318007/modeling_dbrx.py", line 886, in forward
resid_states, hidden_states, self_attn_weights, present_key_value = self.norm_attn_norm(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/52aabf1d1280c4a1f0425f8bc3554b66c1318007/modeling_dbrx.py", line 666, in forward
hidden_states = self.norm_1(hidden_states).to(hidden_states.dtype)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 201, in forward
return F.layer_norm(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2546, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Do I need to change some parameters in config.json file as well? Otherwise, let me know how to fix this. Thanks~
I have two suggestions:
1- Pass torch_dtype=torch.float32
2- Change L35 here to float32 (https://huggingface.co/PrunaAI/dbrx-instruct-bnb-4bit/blob/3b64c668f5fc525408170ca6565e347b9f95103f/config.json#L35)
Thanks! I will try it out tonight. I also have 36GB VRAM. Is it possible to offload part of the model to GPU and some part to CPU (RAM)? If yes, let me know. I will try out both methods tonight. Thanks!
I would suggest to try both methods at the same time.
Regarding your second question, if you set device_map='auto'
you should be able to use both your GPU and CPU.
ok, I tested cpu only method. I am now getting this error:
user:~/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit$ python3 dbrx_run.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:02<00:00, 6.28it/s]
Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.
Traceback (most recent call last):
File "/home/user/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit/dbrx_run.py", line 12, in <module>
outputs = model.generate(**input_ids, max_new_tokens=100)
File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1307, in forward
outputs = self.transformer(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1105, in forward
block_outputs = block(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 886, in forward
resid_states, hidden_states, self_attn_weights, present_key_value = self.norm_attn_norm(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 668, in forward
hidden_states, attn_weights, past_key_value = self.attn(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 316, in forward
qkv_states = self.Wqkv(hidden_states)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 429, in forward
out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 577, in matmul_4bit
return MatMul4Bit.apply(A, B, out, bias, quant_state)
File "/home/user/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 516, in forward
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1082, in dequantize_4bit
device = pre_call(A.device)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 437, in pre_call
torch.cuda.set_device(device)
File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 406, in set_device
device = _get_device_index(device)
File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index
raise ValueError(f"Expected a cuda device, but got: {device}")
ValueError: Expected a cuda device, but got: cpu
When I change the device_map='auto', I also get an error.
This is an error for device_map='auto'.
user:~/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit$ python3 dbrx_run.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:07<00:00, 1.91it/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation.
Traceback (most recent call last):
File "/home/user/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit/dbrx_run.py", line 12, in <module>
outputs = model.generate(**input_ids, max_new_tokens=100)
File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1307, in forward
outputs = self.transformer(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1105, in forward
block_outputs = block(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 886, in forward
resid_states, hidden_states, self_attn_weights, present_key_value = self.norm_attn_norm(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 668, in forward
hidden_states, attn_weights, past_key_value = self.attn(
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 316, in forward
qkv_states = self.Wqkv(hidden_states)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 161, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 347, in pre_forward
set_module_tensor_to_device(
File "/home/user/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 411, in set_module_tensor_to_device
new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 313, in to
return self._quantize(device)
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 280, in _quantize
w_4bit, quant_state = bnb.functional.quantize_4bit(
File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1009, in quantize_4bit
raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8
@johnrachwanpruna let me know if there is a fix to above issues. Thanks!
ok, I tested cpu only method. I am now getting this error:
user:~/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit$ python3 dbrx_run.py Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:02<00:00, 6.28it/s] Setting `pad_token_id` to `eos_token_id`:100257 for open-end generation. Traceback (most recent call last): File "/home/user/Downloads/text-generation-webui-main/models/PrunaAI_dbrx-instruct-bnb-4bit/dbrx_run.py", line 12, in <module> outputs = model.generate(**input_ids, max_new_tokens=100) File "/home/user/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate result = self._greedy_search( File "/home/user/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search outputs = self( File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1307, in forward outputs = self.transformer( File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 1105, in forward block_outputs = block( File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 886, in forward resid_states, hidden_states, self_attn_weights, present_key_value = self.norm_attn_norm( File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 668, in forward hidden_states, attn_weights, past_key_value = self.attn( File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.cache/huggingface/modules/transformers_modules/SinclairSchneider/dbrx-instruct-quantization-fixed/dfa405627499b4934c7d63132f7a3002ecf97d1e/modeling_dbrx.py", line 316, in forward qkv_states = self.Wqkv(hidden_states) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 429, in forward out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state) File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 577, in matmul_4bit return MatMul4Bit.apply(A, B, out, bias, quant_state) File "/home/user/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 516, in forward output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias) File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1082, in dequantize_4bit device = pre_call(A.device) File "/home/user/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 437, in pre_call torch.cuda.set_device(device) File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 406, in set_device device = _get_device_index(device) File "/home/user/.local/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index raise ValueError(f"Expected a cuda device, but got: {device}") ValueError: Expected a cuda device, but got: cpu
When I change the device_map='auto', I also get an error.
Did you also put device_map='cpu' here ? and is the input also on cpu ?
@johnrachwanpruna yes, I changed the parameter to device_map='cpu'. Regarding inputs, yes, I also changed it to cpu. I could not find any similar issues in Google search. Then, I tried device_map='auto' which also resulted in a different error as shared above.
@johnrachwanpruna let me know if there is a workaround for this issue. Thanks!