Quantization Script
#1
by
kawchar85
- opened
Can you share the script used for quantization?
Also what was the memory requirements?
I essentially used this example script from llm-compressor: https://github.com/vllm-project/llm-compressor/blob/main/examples/awq/llama_example.py
I have a private hf space that I use to run the quantizations so I don't remember the exact memory requirements. llm-compressor is pretty tolerant to low memory since it quantizes layer by layer, it just takes longer with less vram.
Hope that answers the question!
Thank you