How to speed up inference?

by vegasscientific - opened 4 days ago

4 days ago

I tried this on 2x A6000 48gb and it takes around 35s for a test image. I put it on H100 80gb and it still takes 25s for an image. Is there a vLLM configuration or other example to get faster inference speed? The Qwen API server is much faster - what configuration do they use?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment