How to run 0528version on GPU which don't support FP8
When I run on A800 , it throws error that ValueError: FP8 quantized models is only supported on GPUs with compute capability >= 8.9 (e.g 4090/H100), actual = 8.0
Have you thought to yourself that you do not have enough GPUs to run a 600B parameter model. Unless I missed something?
@Micdiane , how many A800s do you have? and what is the memory size per A800? There are solutions for different requirements, but I just want to suggest an optimal choice that fits your case best.
@Micdiane , how many A800s do you have? and what is the memory size per A800? There are solutions for different requirements, but I just want to suggest an optimal choice that fits your case best.
2A800, total 2* 80GB. It's a little tough for 600B LLM,hh
Looks like even IQ2 cannot work but IQ1. However, IQ1 drops quality a lot, making it less comparable with other smaller models. To enjoy full FP8 precision, seems like CPU + GPU is your only possible choice, which requires 600GB CPU memory to store MoE.