Is 4 x H20 96G sufficient to run this model?

#2
by milongwong - opened

We have limited resource and have questions below:

  1. Is 4 x H20 96G sufficient to run this model?
  2. Has anyone tried to get it run by SGlang to get better performance output?

The size of quantized params is 346GB. Still very large

4 x H20 96G can run it. But the context length will be very short.

Sign up or log in to comment