32B Model Inference Issue
#6
by
RachelZhou
- opened
I used the following code to load the model:
pretrained ="lmms-lab/LLaVA-NeXT-Video-32B-Qwen"
model_name = "llava_qwen"
device = "cuda"
device_map = "auto"
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, torch_dtype="bfloat16", device_map=device_map)
The error occurs during the load_pretrained_model call when trying to instantiate the LlavaQwenForCausalLM model. The same code works successfully when loading lmms-lab/LLaVA-Video-7B-Qwen2 without any issues. However, the issue appears specifically when loading the LLaVA-NeXT-Video-32B-Qwen model.
Any suggestions on how to properly load this model would be greatly appreciated.
RachelZhou
changed discussion title from
Need help with loading 32B model
to 32B Model Inference Issue