Performed my own Quantization, now encounter an error while running inference with vllm.

#1
by ryan-rozanitis-bd - opened

Hi solidrust! Firstly, we are big fans of your quantized model! We have been using it for a few months.

Now, we want to iterate upon it using the process below.

  1. Finetune the base model. This results in a new model.safetensors.
  2. Merge this model.safetensor back into the base model.safetensors, by loading the finetuned safetensor state dict to the model.
  3. Quantize the result of step 2.

When running trying to use this model with vllm, I get an error like below. Did you encounter this when you quantized the base model?
RuntimeError: start (0) + length (7168) exceeds dimension size (4096).

SolidRusT Networks org

I don't remember, but there are usually two ways to solve that error:

  • update the config.json, then quantize the model again (recommeneded)
  • find a related setting or flag in vllm to override dimention size.
Suparious changed discussion status to closed

Sign up or log in to comment