Is 4 x H20 96G sufficient to run this model?
#2
by
milongwong
- opened
We have limited resource and have questions below:
- Is 4 x H20 96G sufficient to run this model?
- Has anyone tried to get it run by SGlang to get better performance output?
The size of quantized params is 346GB. Still very large
4 x H20 96G can run it. But the context length will be very short.