BFloat16 is not supported on MPS

#13

by RDY97 - opened Apr 24

Apr 24

Running the demo code on MACOS, console print out:

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/utils/modeling.py:1363: UserWarning: Current model requires 113249664 bytes of buffer for offloaded layers, which seems does not fit any GPU's remaining memory. If you are experiencing a OOM later, please consider using offload_buffers=True.
  warnings.warn(
Loading checkpoint shards:   0%|                                                                                                                                 | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/randomyang/Llama3_demo/try.py", line 6, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 399, in set_module_tensor_to_device
    new_value = value.to(device)
                ^^^^^^^^^^^^^^^^
TypeError: BFloat16 is not supported on MPS

OS info:

sunwenzhe

Apr 25

我也一样

sunwenzhe

Apr 25

有人有解决方法吗

hiyouga

Apr 25

Mac 用户建议使用 ollama
ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8

RDY97

Apr 25

Mac 用户建议使用 ollama
ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8

Yes, I just try this. I worked for me! But I am curious about why ollama can run this locally without the float problem.

flymonk

Apr 27

•

edited Apr 27

Mac 用户建议使用 ollama
ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8

Yes, I just try this. I worked for me! But I am curious about why ollama can run this locally without the float problem.

Ollama is based on llama.cpp not pytorch.
Pytorch for mps is very weak with many problems and bugs.

shenzhi-wang changed discussion status to closed Apr 30

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment