Any luck running this on vLLM?

#1
by ceoofcapybaras - opened

I'm getting ValueError: There is no module or parameter named 'lm_head.weight_scale' in MolmoForCausalLM even when specifying --quantization=compressed_tensors

Nope, getting the same error when trying with vLLM :/
I also tried to requantize with the current version of llmcompressor, but Molmo can't be loaded anymore.
What seems to work with vLLM is inflight quantization with vllm serve allenai/Molmo-7B-D-0924 --trust-remote-code --quantization fp8.
However, this might lower the performance compared to a pre-quantized model

Sign up or log in to comment