why all the scale values of attn out_proj have the none-zero mantissa 0b111010101010010101001111111101 ?
#22 opened 42 minutes ago
by
abcstar
Update system prompt to include tools
#21 opened about 4 hours ago
by
bchenfireworks
recommended temp?
1
#19 opened about 5 hours ago
by
createthis
Tool calling usage examples
#18 opened about 11 hours ago
by
1000Xia
Context length: is it 128K (as mentioned in the model card) or 160K (as specified in config.json)?
1
#17 opened about 11 hours ago
by
Lissanro

Front row!
#16 opened about 14 hours ago
by
MagnumYB

不知道还有没有蒸馏模型
3
#13 opened about 17 hours ago
by
BlackLeee
bro you forgot to put it under the v3.1 collection
👀
1
1
#12 opened about 17 hours ago
by
DingzhenPearl
no score on swe under thinking mode
➕
👀
2
2
#11 opened about 18 hours ago
by
vitvamer

This model’s censorship is insane
🤯
🧠
5
8
#10 opened about 19 hours ago
by
smile1030
请问simpleQA结果是否为笔误?
4
#9 opened about 19 hours ago
by
rnc000
Add base model metadata
#8 opened about 19 hours ago
by
davanstrien

梁文疯垃圾模型
😎
🧠
3
10
#7 opened about 20 hours ago
by
eiskalt
search是用think模式吗
#6 opened about 20 hours ago
by
awdrgyjilplij
Congratulations to DeepSeek, this version seems powerful for coding and agent development
👀
❤️
1
1
#5 opened about 20 hours ago
by
Robin-Han

火速下载
#4 opened about 20 hours ago
by
HowardChenRV

Any plan to release the post-training recipe?
1
#3 opened about 20 hours ago
by
Yi30
Come on, third party bros! Deploy it!
🚀
🔥
5
2
#2 opened about 21 hours ago
by
DingzhenPearl