What was the reason for the skipped layers?

by OwenArli - opened 8 days ago

8 days ago

Just curious what was the reason for the skipped layers?

    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head", "re:.*125.*", "re:.*134.*", "re:.*143.*", "re:.*149.*"], dampening_frac=0.01, offload_hessians=False),

Ithanil

Owner 2 days ago

See my note on the model card:

Note that Layers 125, 134, 143 and 149 had to be excluded from GPTQ quantization, because their extreme size would lead to allocations of 600+GB Heassian matrices for GPTQ (which couldn't be offloaded for some reason). Furthermore, the GPU memory allocation code in calculate_offload_device_map() was adjusted.

OwenArli

1 day ago

See my note on the model card:

Note that Layers 125, 134, 143 and 149 had to be excluded from GPTQ quantization, because their extreme size would lead to allocations of 600+GB Heassian matrices for GPTQ (which couldn't be offloaded for some reason). Furthermore, the GPU memory allocation code in calculate_offload_device_map() was adjusted.

I see that makes sense. Thanks for explaining!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment