What was the reason for the skipped layers?

#2
by OwenArli - opened

Just curious what was the reason for the skipped layers?

    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head", "re:.*125.*", "re:.*134.*", "re:.*143.*", "re:.*149.*"], dampening_frac=0.01, offload_hessians=False),

See my note on the model card:

Note that Layers 125, 134, 143 and 149 had to be excluded from GPTQ quantization, because their extreme size would lead to allocations of 600+GB Heassian matrices for GPTQ (which couldn't be offloaded for some reason). Furthermore, the GPU memory allocation code in calculate_offload_device_map() was adjusted.

See my note on the model card:

Note that Layers 125, 134, 143 and 149 had to be excluded from GPTQ quantization, because their extreme size would lead to allocations of 600+GB Heassian matrices for GPTQ (which couldn't be offloaded for some reason). Furthermore, the GPU memory allocation code in calculate_offload_device_map() was adjusted.

I see that makes sense. Thanks for explaining!

Sign up or log in to comment