Thank you!, Is it possible to run this with vLLM or sglang ?

#18

by getfit - opened Apr 5

Apr 5

Llama4ForConditionalGeneration has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 04-05 16:31:32 [transformers.py:119] Using Transformers backend.
WARNING 04-05 16:31:32 [config.py:3692] torch.compile is turned on, but the model models/Llama4-scout-17B-Instruct does not support it. Please open an issue on GitHub if you want it to be supported.

Hamid-Nazeri

Meta Llama org Apr 5

it should be fixed now

andrean-pare

Apr 6

•

edited Apr 6

Still don`t working:

AttributeError: 'Llama4Config' object has no attribute 'vocab_size'

Caused by flag --max-model-len 65536

Shjejdje

Apr 6

Реп

getfit

Apr 6

I can not get this to work in vllm, I saw a post on x that its supposed to work. But I also see a pr for llama4 support that has been going for hrs. https://github.com/vllm-project/vllm/pull/16104

loghugging25

Apr 6

Still don`t working:

AttributeError: 'Llama4Config' object has no attribute 'vocab_size'

I got the same error. Let's try again once https://github.com/vllm-project/vllm/pull/16113 is merged

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment