使用llama-factory训练70B最低的硬件配置是什么?
#20 opened 2 months ago
by
Lraos

Do not require reasoning but just the ouput
1
#19 opened 3 months ago
by
ameyv6
chat_template中为什么要把assistant角色中的<think>过程切掉
👍
2
#18 opened 3 months ago
by
zhm0
能否发布一个awq版本的模型:deepseek-r1-distill-llama-70b-AWQ
#17 opened 3 months ago
by
classdemo
Update README.md
#16 opened 3 months ago
by
shubham-kothari
Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?
1
#14 opened 3 months ago
by
Merk0701234
weight files naming is not regular rule
#13 opened 4 months ago
by
haili-tian
How much vram do you need?
8
#12 opened 4 months ago
by
hyun10
Upload IMG_4815.jpeg
#11 opened 4 months ago
by
H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error
3
#10 opened 4 months ago
by
neelkapadia
<thinking> is the proper tag?
👍
1
4
#8 opened 4 months ago
by
McUH
Add pipeline tag
#7 opened 4 months ago
by
nielsr

Template
👍
1
#6 opened 4 months ago
by
tugot17
SFT (Non-RL) distillation is this good on a sub-100B model?
3
#2 opened 4 months ago
by
KrishnaKaasyap