WIP This big one will take a bit, please be patient as it cooks and uploads!
ik_llama.cpp
imatrix Quantizations of deepcogito/cogito-v2-preview-deepseek-671B-MoE
This quant collection REQUIRES ik_llama.cpp fork to support the ik's latest SOTA quants and optimizations! Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
NOTE ik_llama.cpp
can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
Some of ik's new quants are supported with Nexesenex/croco.cpp fork of KoboldCPP.
These quants provide best in class perplexity for the given memory footprint.
Big Thanks
Shout out to Wendell and the Level1Techs crew, the community Forums, YouTube Channel! BIG thanks for providing BIG hardware expertise and access to run these experiments and make these great quants available to the community!!!
Also thanks to all the folks in the quanting and inferencing community on BeaverAI Club Discord and on r/LocalLLaMA for tips and tricks helping each other run, test, and benchmark all the fun new models!
Quant Collection
Perplexity computed against wiki.test.raw.
These first two are just test quants for baseline perplexity comparison:
Q8_0
665.301 GiB (8.504 BPW)- Final estimate: PPL = TODO
Q4_0
TODO GiB (TODO BPW)- Final estimate: PPL = TODO
TODO
Quick Start
CPU-Only
Note it is auto-detecting chat template incorrectly so explicitly set --chat-template deepseek3
#!/usr/bin/env bash
model=/mnt/raid/models/ubergarm/cogito-v2-preview-deepseek-671B-MoE-GGUF/cogito-v2-preview-deepseek-671B-MoE-Q8_0.gguf
numactl -N 0 -m 0 \
./build/bin/llama-server \
--model "$model"\
--alias ubergarm/cogito-v2-preview-deepseek-671B-MoE-Q8_0 \
--chat-template deepseek3 \
--ctx-size 32768 \
-ctk q8_0 \
-fa -fmoe \
-mla 3 \
--parallel 1 \
--threads 128 \
--threads-batch 192 \
--numa numactl \
--host 127.0.0.1 \
--port 8080 \
--no-mmap
References
Model tree for ubergarm/cogito-v2-preview-deepseek-671B-MoE-GGUF
Base model
deepseek-ai/DeepSeek-V3-Base