why the model inference so slowly??

#29
by LuYinMiao - opened

I spent more than 30 mins to solve only 5 question. the config is:

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B",torch_dtype="auto").to('cuda')

对问题进行tokenization

inputs = tokenizer(text, return_tensors="pt").to(model.device)

# 使用模型生成答案
outputs = model.generate(**inputs, max_length=1024)

# 解码生成的答案
outputs= tokenizer.decode(outputs[0], skip_special_tokens=True).

Please give some help.

Sign up or log in to comment