Any luck running this on vLLM?
#1
by
ceoofcapybaras
- opened
I'm getting ValueError: There is no module or parameter named 'lm_head.weight_scale' in MolmoForCausalLM
even when specifying --quantization=compressed_tensors
Nope, getting the same error when trying with vLLM :/
I also tried to requantize with the current version of llmcompressor, but Molmo can't be loaded anymore.
What seems to work with vLLM is inflight quantization with vllm serve allenai/Molmo-7B-D-0924 --trust-remote-code --quantization fp8
.
However, this might lower the performance compared to a pre-quantized model