Quantization Script

by kawchar85 - opened Aug 6

Discussion

kawchar85

Aug 6

Can you share the script used for quantization?
Also what was the memory requirements?

warshanks

Owner Aug 6

I essentially used this example script from llm-compressor: https://github.com/vllm-project/llm-compressor/blob/main/examples/awq/llama_example.py

I have a private hf space that I use to run the quantizations so I don't remember the exact memory requirements. llm-compressor is pretty tolerant to low memory since it quantizes layer by layer, it just takes longer with less vram.

Hope that answers the question!

kawchar85

Aug 6

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment