What's the hype around this model?
#1
by
mtcl
- opened
Would you know why this model is gaining so much attention? What is the hype all about?
Would you know why this model is gaining so much attention? What is the hype all about?
60% less thinking token
lol no idea! I am rushing to convert it now and get my first Q8_0 and run some llama-server tests on it while I grab an imatrix. Then hopefully I'll get some feedback on if the full size model is any good while quantizing it!
Ran into some hiccups given the original config.json
was had this single line which is not present in the original deepseek file:
$ cat config.json | grep head_
"head_dim": 64,
So i backed up the file and deleted that line and am converting it now!
Shard (29/30): 34%|ββββ | 15.5G/46.1G [04:25<08:51, 57.6Mbyte/s]
Writing: 97%|ββββββββββ| 1.31T/1.34T [4:17:11<10:50, 59.3Mbyte/s]
:fingers_crossed:
The Q8_0 is cooking!
[ 331/1147] blk.25.ffn_down_exps.weight - [ 2048, 7168, 256, 1], type = bf16, converting to q8_0 .. size = 7168.00 MiB -> 3808.00