Do not require reasoning but just the ouput
1
#19 opened 2 days ago
by
ameyv6
chat_template中为什么要把assistant角色中的<think>过程切掉
#18 opened 8 days ago
by
zhm0
能否发布一个awq版本的模型:deepseek-r1-distill-llama-70b-AWQ
#17 opened 8 days ago
by
classdemo
Update README.md
#16 opened 12 days ago
by
shubham001213
Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?
1
#14 opened 19 days ago
by
Merk0701234
weight files naming is not regular rule
#13 opened 27 days ago
by
haili-tian
How much vram do you need?
8
#12 opened 30 days ago
by
hyun10
Upload IMG_4815.jpeg
#11 opened about 1 month ago
by
H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error
3
#10 opened about 1 month ago
by
neelkapadia
<thinking> is the proper tag?
4
#8 opened about 1 month ago
by
McUH
Add pipeline tag
#7 opened about 1 month ago
by
nielsr

SFT (Non-RL) distillation is this good on a sub-100B model?
3
#2 opened about 1 month ago
by
KrishnaKaasyap