KeyError: 'layers.30.mlp.gate.weight_packed'
#1
by
SLKun
- opened
vLLM API server version 0.8.5
GPU:RTX4090
commnad: python -m vllm.entrypoints.openai.api_server --model ~/.cache/huggingface/hub/models--nytopop--Qwen3-30B-A3B.w4a16/snapshots/558b19131931b4078bdb8bfb172808a6e544cc67/ --served-model-name Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1
in my case, it was layer.14
but same here.
Same here!
@nytopop Have you seen this? https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/qwen_moe_w4a16.py
Looks like there are some layers which you didn't ignore.
Ah, yea that'll do it. Reuploaded with gates ignored, should work now.