deepseek-ai/DeepSeek-R1-0528 · How to run 0528version on GPU which don't support FP8

5 days ago

When I run on A800 , it throws error that ValueError: FP8 quantized models is only supported on GPUs with compute capability >= 8.9 (e.g 4090/H100), actual = 8.0

owenqwenllmwine

5 days ago

Have you thought to yourself that you do not have enough GPUs to run a 600B parameter model. Unless I missed something?

ghostplant

5 days ago

•

edited 5 days ago

@Micdiane , how many A800s do you have? and what is the memory size per A800? There are solutions for different requirements, but I just want to suggest an optimal choice that fits your case best.

Micdiane

5 days ago

@Micdiane , how many A800s do you have? and what is the memory size per A800? There are solutions for different requirements, but I just want to suggest an optimal choice that fits your case best.

2A800, total 2* 80GB. It's a little tough for 600B LLM，hh

ghostplant

5 days ago

•

edited 5 days ago

@Micdiane , how many A800s do you have? and what is the memory size per A800? There are solutions for different requirements, but I just want to suggest an optimal choice that fits your case best.

2A800, total 2* 80GB. It's a little tough for 600B LLM，hh

Looks like even IQ2 cannot work but IQ1. However, IQ1 drops quality a lot, making it less comparable with other smaller models. To enjoy full FP8 precision, seems like CPU + GPU is your only possible choice, which requires 600GB CPU memory to store MoE.