Why is it so slow to first token?
#3
by
kweg
- opened
I've been using for a while this model: https://huggingface.co/lmstudio-community/Llama-4-Scout-17B-16E-MLX-text-4bit
And it's usually (for short prompts) 0.3s to first token.
I've tried to switch to this model as it has a vision support. And it's 99% of time (for the same prompts) 5s to first token. Same settings, same everything. Except of course vision support. Is the vision support that expensive?