Why is it so slow to first token?

by kweg - opened Apr 26

kweg

MLX Community org Apr 26

I've been using for a while this model: https://huggingface.co/lmstudio-community/Llama-4-Scout-17B-16E-MLX-text-4bit
And it's usually (for short prompts) 0.3s to first token.

I've tried to switch to this model as it has a vision support. And it's 99% of time (for the same prompts) 5s to first token. Same settings, same everything. Except of course vision support. Is the vision support that expensive?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment