32B Model Inference Issue

#6
by RachelZhou - opened

I used the following code to load the model:
pretrained ="lmms-lab/LLaVA-NeXT-Video-32B-Qwen"
model_name = "llava_qwen"
device = "cuda"
device_map = "auto"
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, torch_dtype="bfloat16", device_map=device_map)

The error occurs during the load_pretrained_model call when trying to instantiate the LlavaQwenForCausalLM model. The same code works successfully when loading lmms-lab/LLaVA-Video-7B-Qwen2 without any issues. However, the issue appears specifically when loading the LLaVA-NeXT-Video-32B-Qwen model.

Any suggestions on how to properly load this model would be greatly appreciated.

RachelZhou changed discussion title from Need help with loading 32B model to 32B Model Inference Issue

Sign up or log in to comment