Thank you!, Is it possible to run this with vLLM or sglang ?

#18
by getfit - opened

Llama4ForConditionalGeneration has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 04-05 16:31:32 [transformers.py:119] Using Transformers backend.
WARNING 04-05 16:31:32 [config.py:3692] torch.compile is turned on, but the model models/Llama4-scout-17B-Instruct does not support it. Please open an issue on GitHub if you want it to be supported.

Meta Llama org

it should be fixed now

Still don`t working:

AttributeError: 'Llama4Config' object has no attribute 'vocab_size'

Caused by flag --max-model-len 65536

Реп

I can not get this to work in vllm, I saw a post on x that its supposed to work. But I also see a pr for llama4 support that has been going for hrs. https://github.com/vllm-project/vllm/pull/16104

Still don`t working:

AttributeError: 'Llama4Config' object has no attribute 'vocab_size'

I got the same error. Let's try again once https://github.com/vllm-project/vllm/pull/16113 is merged

Sign up or log in to comment