SGLang very slow ~6 toks with 1 concurrency on H100SXM
#3
by
RonanMcGovern
- opened
I'm using SGLang latest docker image (latest tag).
Same issue with Qwen 32B dense.
The issue was that I was not counting reasoning tokens, as they are returned in a separate field.
RonanMcGovern
changed discussion status to
closed