Why are the new 4B and 8B models slower than the previous 7B-1M model??
The documentation says that this new version should produce better quality results than version 2.5.
I don't really see any quality improvement to my summaries (some are not as good), but more importantly, the inference speed of both the new 4B and 8B models is slower than that of the old 7B model.
I'm using vLLM, on 4x RTX 3060, weight-only FP8 compression using the Marlin kernel, 'enable_thinking=False', and tried with batchsize 1 and batchsize 4 with similar results. Below is the output from VLLM for summarizing a text of about 28k tokens. I ran each task twice. I presume the second run is always much faster because of some prefix caching. Anyway, the new 4B model is slightly slower than the 7B model on the first run, and terribly worse on the second repetition. The new 8B model takes almost twice as long as the old 7B on the first run, and is also much slower on the second run. Below is the vLLM output:
Qwen2.5-7b-1M:
[00:13<00:00, 13.75s/it, est. speed input: 2063.93 toks/s, output: 14.69 toks/s] (first run)
[00:02<00:00, 2.23s/it, est. speed input: 12732.36 toks/s, output: 87.04 toks/s] (second run)
Qwen3-4b:
[00:14<00:00, 14.45s/it, est. speed input: 1964.26 toks/s, output: 16.96 toks/s] (first run)
[00:03<00:00, 3.92s/it, est. speed input: 7249.40 toks/s, output: 62.58 toks/s] (second run)
Qwen3-8b: about 3.4 second
[00:20<00:00, 20.77s/it, est. speed input: 1366.44 toks/s, output: 8.62 toks/s] (first run)
[00:02<00:00, 2.94s/it, est. speed input: 9656.35 toks/s, output: 60.90 toks/s] (second run)
Did anybody else notice that the new 4b and 8b models are slower than the old 7b-1M model?
Am I doing something wrong or missing something?
Thanks to the Qwen team for the awesome 7B-1M model, and to anybody who can help me understand what's going on.
How do you use it? List your setting parameters. This qwen3 model is very sensitive to parameter Settings
cause more layers?
Improvement of results often times means a bit of a loss of speed. Just the nature of things.