Any chance for IQ3_XXS/IQ3_XS or similar size?
Hi there, thanks for the quant! I was wondering if it was possible to get a quant of ~300GB size or so, as I have 344GB memory (between VRAM + RAM), so can't load IQ4 :(
For example, I can load https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF-UD/tree/main/UD-Q3_K_XL which is ~276GB.
@Panchovix I think the guy who quantized that pruned coder variant of V3-0324 has done it?
DevQuasar/tngtech.DeepSeek-R1T-Chimera-GGUF
I haven't tested them myself because the I only have 248GB combined VRAM/RAM
Yeah I realize this quant weighs in a little heavy at 339G which is a little tight even for 256GB RAM + 96GB VRAM.... Honestly I'm not sure it will finish uploading even... :fingers_crossed:
This one has a lot of iq4_ks
layers which is pretty fast on CUDA, but yeah I don't have two RTX PRO 6000s myself either hah...
Oh I think I can't fit Q3_K_M (or near to the limit), but got Q3_K_S from here and it works.
https://huggingface.co/bullerwins/DeepSeek-R1T-Chimera-GGUF/tree/main/DeepSeek-R1T-Chimera-Q3_K_S
But I feel the quants of @ubergarm could have better quality with the imatrix.
I'm working on the updated one right now which might be a good size for you given ik's recent IQ3_KS
is now available:
llm_load_print_meta: model type = 671B
llm_load_print_meta: model ftype = IQ3_KS - 3.1875 bpw
llm_load_print_meta: model params = 672.050 B
llm_load_print_meta: model size = 281.463 GiB (3.598 BPW)
llm_load_print_meta: repeating layers = 280.155 GiB (3.591 BPW, 670.196 B parameters)
llm_load_print_meta: general.name = DeepSeek TNG R1T2 Chimera
Final estimate: PPL = 3.3167 +/- 0.01789
Hope to have it up in the next 12 hours depending on how upload goes hah (this one will be much faster lol): https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF
Okay, ready to go!
Downloading!