Any luck running this on vLLM?

by ceoofcapybaras - opened Dec 20, 2024

Dec 20, 2024

I'm getting ValueError: There is no module or parameter named 'lm_head.weight_scale' in MolmoForCausalLM even when specifying --quantization=compressed_tensors

leon-se

Owner Dec 20, 2024

Nope, getting the same error when trying with vLLM :/
I also tried to requantize with the current version of llmcompressor, but Molmo can't be loaded anymore.
What seems to work with vLLM is inflight quantization with vllm serve allenai/Molmo-7B-D-0924 --trust-remote-code --quantization fp8.
However, this might lower the performance compared to a pre-quantized model

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment