Resources

View closed (2)

Tool Use

#21 opened 5 months ago by

jhuntbach

使用llama-factory训练70B最低的硬件配置是什么？

#20 opened 6 months ago by

Lraos

Do not require reasoning but just the ouput

#19 opened 6 months ago by

ameyv6

chat_template中为什么要把assistant角色中的<think>过程切掉

👍 3

#18 opened 6 months ago by

zhm0

能否发布一个awq版本的模型：deepseek-r1-distill-llama-70b-AWQ

#17 opened 6 months ago by

classdemo

Update README.md

#16 opened 7 months ago by

shubham-kothari

chnsmth

#15 opened 7 months ago by

chnsmth

Does DeepSeek-Llama-70B support tensor parallelism for multi-GPU inference?

#14 opened 7 months ago by

Merk0701234

weight files naming is not regular rule

#13 opened 7 months ago by

haili-tian

How much vram do you need?

#12 opened 7 months ago by

hyun10

Upload IMG_4815.jpeg

#11 opened 7 months ago by

H3mzy11

Amazon Sagemaker deployment failing with CUDA OutOfMemory error

#10 opened 7 months ago by

neelkapadia

<thinking> is the proper tag?

👍 1

#8 opened 7 months ago by

McUH

Add pipeline tag

#7 opened 7 months ago by

nielsr

Template

👍 1

#6 opened 8 months ago by

tugot17

Lora

#4 opened 8 months ago by

PSM24

SFT (Non-RL) distillation is this good on a sub-100B model?

#2 opened 8 months ago by

KrishnaKaasyap

Lfg

🔥 8

#1 opened 8 months ago by

Prakh24s