nytopop
/

Qwen3-30B-A3B.w4a16

Text Generation

compressed-tensors

Model card Files Files and versions Community

KeyError: 'layers.30.mlp.gate.weight_packed'

#1

by SLKun - opened 3 days ago

SLKun

3 days ago

vLLM API server version 0.8.5
GPU:RTX4090
commnad: python -m vllm.entrypoints.openai.api_server --model ~/.cache/huggingface/hub/models--nytopop--Qwen3-30B-A3B.w4a16/snapshots/558b19131931b4078bdb8bfb172808a6e544cc67/ --served-model-name Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1

Bedovyy

3 days ago

in my case, it was layer.14 but same here.

3 days ago

•

edited 3 days ago

Same here!

@nytopop Have you seen this? https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/qwen_moe_w4a16.py

Looks like there are some layers which you didn't ignore.

nytopop

Owner 3 days ago

Ah, yea that'll do it. Reuploaded with gates ignored, should work now.

2 days ago

@nytopop Thanks, works great now!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment