BFloat16 is not supported on MPS
#13
by
RDY97
- opened
Running the demo code on MACOS, console print out:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/utils/modeling.py:1363: UserWarning: Current model requires 113249664 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
warnings.warn(
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/randomyang/Llama3_demo/try.py", line 6, in <module>
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 399, in set_module_tensor_to_device
new_value = value.to(device)
^^^^^^^^^^^^^^^^
TypeError: BFloat16 is not supported on MPS
我也一样
有人有解决方法吗
Mac 用户建议使用 ollamaollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8
Mac 用户建议使用 ollama
ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8
Yes, I just try this. I worked for me! But I am curious about why ollama can run this locally without the float problem.
Mac 用户建议使用 ollama
ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8
Yes, I just try this. I worked for me! But I am curious about why ollama can run this locally without the float problem.
Ollama is based on llama.cpp not pytorch.
Pytorch for mps is very weak with many problems and bugs.
shenzhi-wang
changed discussion status to
closed