Unable to load the model
#1
by
NamburiSrinath
- opened
Hi,
I was trying to load the model using transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('RedHatAI/Llama-2-7b-chat-quantized.w8a8')
print(model)
but was resulting in below error
File "/quantization/compress_vllm.py", line 4, in <module>
model = AutoModelForCausalLM.from_pretrained('RedHatAI/Llama-2-7b-chat-quantized.w8a8')
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 573, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 272, in _wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4422, in from_pretrained
hf_quantizer.preprocess_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/base.py", line 215, in preprocess_model
return self._process_model_before_weight_loading(model, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_compressed_tensors.py", line 121, in _process_model_before_weight_loading
raise ValueError("`run_compressed` is only supported for quantized_compressed models")
ValueError: `run_compressed` is only supported for quantized_compressed models