Qwen
/

Text Generation
Transformers
Safetensors
qwen3_moe
conversational
fp8

SGLang very slow ~6 toks with 1 concurrency on H100SXM

#3
by RonanMcGovern - opened

I'm using SGLang latest docker image (latest tag).

Same issue with Qwen 32B dense.

The issue was that I was not counting reasoning tokens, as they are returned in a separate field.

RonanMcGovern changed discussion status to closed

Sign up or log in to comment