Any chance for IQ3_XXS/IQ3_XS or similar size?

by Panchovix - opened May 18

May 18

Hi there, thanks for the quant! I was wondering if it was possible to get a quant of ~300GB size or so, as I have 344GB memory (between VRAM + RAM), so can't load IQ4 :(

For example, I can load https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/tree/main/UD-Q3_K_XL which is ~276GB.

gghfez

May 18

@Panchovix I think the guy who quantized that pruned coder variant of V3-0324 has done it?

DevQuasar/tngtech.DeepSeek-R1T-Chimera-GGUF

I haven't tested them myself because the I only have 248GB combined VRAM/RAM

ubergarm

Owner May 18

Yeah I realize this quant weighs in a little heavy at 339G which is a little tight even for 256GB RAM + 96GB VRAM.... Honestly I'm not sure it will finish uploading even... :fingers_crossed:

This one has a lot of iq4_ks layers which is pretty fast on CUDA, but yeah I don't have two RTX PRO 6000s myself either hah...

Panchovix

May 18

Oh I think I can't fit Q3_K_M (or near to the limit), but got Q3_K_S from here and it works.

https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q3_K_S

But I feel the quants of @ubergarm could have better quality with the imatrix.

ubergarm

Owner 10 days ago

@Panchovix

I'm working on the updated one right now which might be a good size for you given ik's recent IQ3_KS is now available:

llm_load_print_meta: model type = 671B
llm_load_print_meta: model ftype = IQ3_KS - 3.1875 bpw
llm_load_print_meta: model params = 672.050 B
llm_load_print_meta: model size = 281.463 GiB (3.598 BPW)
llm_load_print_meta: repeating layers = 280.155 GiB (3.591 BPW, 670.196 B parameters)
llm_load_print_meta: general.name = DeepSeek TNG R1T2 Chimera

Final estimate: PPL = 3.3167 +/- 0.01789

Hope to have it up in the next 12 hours depending on how upload goes hah (this one will be much faster lol): https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF

Panchovix

10 days ago

@ubergarm it looks amazing! For sure will try it as soon it is up.

ubergarm

Owner 9 days ago

Okay, ready to go!

Panchovix

9 days ago

Downloading!

Panchovix

9 days ago

Posted some metrics on https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/discussions/2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment