How to speed up inference?
#4
by
vegasscientific
- opened
I tried this on 2x A6000 48gb and it takes around 35s for a test image. I put it on H100 80gb and it still takes 25s for an image. Is there a vLLM configuration or other example to get faster inference speed? The Qwen API server is much faster - what configuration do they use?