SGLang very slow ~6 toks with 1 concurrency on H100SXM

by RonanMcGovern - opened Apr 30

Apr 30

I'm using SGLang latest docker image (latest tag).

Same issue with Qwen 32B dense.

May 1

The issue was that I was not counting reasoning tokens, as they are returned in a separate field.

RonanMcGovern changed discussion status to closed May 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment