KeyError: 'layers.30.mlp.gate.weight_packed'

#1
by SLKun - opened

vLLM API server version 0.8.5
GPU:RTX4090
commnad: python -m vllm.entrypoints.openai.api_server --model ~/.cache/huggingface/hub/models--nytopop--Qwen3-30B-A3B.w4a16/snapshots/558b19131931b4078bdb8bfb172808a6e544cc67/ --served-model-name Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1

in my case, it was layer.14 but same here.

Same here!

@nytopop Have you seen this? https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/qwen_moe_w4a16.py

Looks like there are some layers which you didn't ignore.

Ah, yea that'll do it. Reuploaded with gates ignored, should work now.

@nytopop Thanks, works great now!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment