ubergarm/cogito-v2-preview-deepseek-671B-MoE-GGUF · What's the hype around this model?

mtcl

4 days ago

Would you know why this model is gaining so much attention? What is the hype all about?

gopi87

4 days ago

Would you know why this model is gaining so much attention? What is the hype all about?

60% less thinking token

ubergarm

Owner 4 days ago

•

edited 4 days ago

lol no idea! I am rushing to convert it now and get my first Q8_0 and run some llama-server tests on it while I grab an imatrix. Then hopefully I'll get some feedback on if the full size model is any good while quantizing it!

Ran into some hiccups given the original config.json was had this single line which is not present in the original deepseek file:

$ cat config.json | grep head_
  "head_dim": 64,

So i backed up the file and deleted that line and am converting it now!

Shard (29/30):  34%|███▎      | 15.5G/46.1G [04:25<08:51, 57.6Mbyte/s]
Writing:  97%|█████████▋| 1.31T/1.34T [4:17:11<10:50, 59.3Mbyte/s]

:fingers_crossed:

The Q8_0 is cooking!

[ 331/1147]          blk.25.ffn_down_exps.weight - [ 2048,  7168,   256,     1], type =   bf16, converting to q8_0 .. size =  7168.00 MiB ->  3808.00