How many GPU Memory AWQ need?

#1
by hermitg - opened

thanks your wonderful work

QuantTrio org

That depends on how much context you would want it to support, as well as how many concurrent users you would like to serve.
Let's say if you want to test out only a handful of users, 4x64GB should be enough, roughly.

Thoughts on 2x RTX 6000 Blackwell? Different quant?

It does fit the layers, but VLLM dies after compilation no matter my settings.

Sign up or log in to comment