Quantization Script

#1
by kawchar85 - opened

Can you share the script used for quantization?
Also what was the memory requirements?

I essentially used this example script from llm-compressor: https://github.com/vllm-project/llm-compressor/blob/main/examples/awq/llama_example.py

I have a private hf space that I use to run the quantizations so I don't remember the exact memory requirements. llm-compressor is pretty tolerant to low memory since it quantizes layer by layer, it just takes longer with less vram.

Hope that answers the question!

Thank you

Sign up or log in to comment