lmms-lab/LLaVA-NeXT-Video-32B-Qwen · 32B Model Inference Issue

I used the following code to load the model:
pretrained ="lmms-lab/LLaVA-NeXT-Video-32B-Qwen"
model_name = "llava_qwen"
device = "cuda"
device_map = "auto"
tokenizer, model, image_processor, max_length = load_pretrained_model(pretrained, None, model_name, torch_dtype="bfloat16", device_map=device_map)

The error occurs during the load_pretrained_model call when trying to instantiate the LlavaQwenForCausalLM model. The same code works successfully when loading lmms-lab/LLaVA-Video-7B-Qwen2 without any issues. However, the issue appears specifically when loading the LLaVA-NeXT-Video-32B-Qwen model.

Any suggestions on how to properly load this model would be greatly appreciated.