What's the hype around this model?

#1
by mtcl - opened

Would you know why this model is gaining so much attention? What is the hype all about?

Would you know why this model is gaining so much attention? What is the hype all about?

60% less thinking token

lol no idea! I am rushing to convert it now and get my first Q8_0 and run some llama-server tests on it while I grab an imatrix. Then hopefully I'll get some feedback on if the full size model is any good while quantizing it!

Ran into some hiccups given the original config.json was had this single line which is not present in the original deepseek file:

$ cat config.json | grep head_
  "head_dim": 64,

So i backed up the file and deleted that line and am converting it now!

Shard (29/30):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 15.5G/46.1G [04:25<08:51, 57.6Mbyte/s]
Writing:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1.31T/1.34T [4:17:11<10:50, 59.3Mbyte/s]

:fingers_crossed:

The Q8_0 is cooking!

[ 331/1147]          blk.25.ffn_down_exps.weight - [ 2048,  7168,   256,     1], type =   bf16, converting to q8_0 .. size =  7168.00 MiB ->  3808.00 

Sign up or log in to comment