Performed my own Quantization, now encounter an error while running inference with vllm.
#1
by
ryan-rozanitis-bd
- opened
Hi solidrust! Firstly, we are big fans of your quantized model! We have been using it for a few months.
Now, we want to iterate upon it using the process below.
- Finetune the base model. This results in a new model.safetensors.
- Merge this model.safetensor back into the base model.safetensors, by loading the finetuned safetensor state dict to the model.
- Quantize the result of step 2.
When running trying to use this model with vllm, I get an error like below. Did you encounter this when you quantized the base model?RuntimeError: start (0) + length (7168) exceeds dimension size (4096).
I don't remember, but there are usually two ways to solve that error:
- update the config.json, then quantize the model again (recommeneded)
- find a related setting or flag in vllm to override dimention size.
Suparious
changed discussion status to
closed